On-demand Ollama in the homelab May 5, 2024 on tomleb's blog

Here’s how I achieve on-demand Ollama in my homelab without increasing my electricity bill through the roof.

High-level overview

The core of the solution lies in configuring a reverse proxy to handle requests to Ollama. If Ollama is not reachable, a special handler kicks in, forwarding the request to ollama-wake-on-lan - referred to as WoL service in the rest of this post.

The WoL service will attempt to wake up the Ollama host and seamlessly redirect the original request back to Ollama. To achieve this, the WoL service will block the request and send Wake-on-Lan packets to wake up the host.

Implementation

I am using Caddy as a reverse proxy. Here’s an example Caddyfile configuration that implements the above solution.

*.example.tld {
	@ollama host ollama.example.tld ollama2.example.tld
	handle @ollama {
		reverse_proxy http://<ollama IP>:11434
	}

This first part simply configures Caddy to act as a reverse proxy for requests coming with a Host header matching either ollama.example.tld or ollama2.example.tld. The latter hostname will be used as the redirect in the ollama-wake-on-lan service, as browsers may not automatically follow redirects to the original hostname.

The configuration continues like so:

	handle_errors {
		@ollama_main host ollama.example.tld
		handle @ollama_main {
			reverse_proxy http://<ollama-wake-on-lan ip>:4000
		}
	}
}

This configures a special handler if Caddy fails to reach Ollama configured in the first snippet. In that case, Caddy will reverse proxy to the WoL service, but only if the Host header matches ollama.example.tld. This prevents infinitely sending requests to the WoL service.

The only thing left is running the WoL service.

$ go install git.sr.ht/~tomleb/ollama-wake-on-lan@master
$ ollama-wake-on-lan -broadcast <broadcast address> \
                     -mac <mac address> \
                     -url <ollama2 url>

That’s it! Now if your Ollama host is down and you try to use its API, Caddy will proxy the request to the WoL service, which will seamlessly wake up the host and will redirect the author of the request to Ollama again.

In my case, I run open-webui, and simply accessing it triggers the above solution, making it even more seamless from a user pov.

Contribute to the discussion in my public inbox by sending an email to ~tomleb/public-inbox@lists.sr.ht [mailing list etiquette]