My Server Setup Part 2: Nomad
In this series of posts I am describing the infrastructure that hosts all of my websites and development projects.
- Part 1: Hosting and Configuration
- Part 2: Nomad Configuration
- Part 3: Nomad Jobs
- Part 4: Continuous Delivery
- Part 5: Eliminating the Downtime
Summary
- Each server runs Nomad in both client and server modes.
- Each server also runs instances of Consul and Vault.
- Fabiolb is used for external proxying and load balancing.
- A custom, open-sourced tool called Aleff is used to automatically obtain and renew certificates from Let’s Encrypt.
Nomad
As mentioned in the previous post I was using Kubernetes for a while, but since the company I work for has standardised on Nomad I’ve switched my personal setup so I can learn more about it and the whole Hashicorp stack. The experience was less than smooth but after a few false starts I now have a stable cluster and everything is working as well as it did with Kubernetes.
My experience with Nomad so far has been very positive. It is incredibly simple to deploy, and integrates with Consul and Vault very well. It’s proven to be extremely resilient and capable, as well as being very flexible. My use case in this cluster is not hugely complicated but knowing the more advanced configuration options are available is a great comfort.
Clients and servers
In a typical Nomad cluster you would have three nodes running the server, and any number of worker nodes running the client. In my case that felt like overkill, and I wanted to try running both on the same nodes as that’s one option we’re looking at for a resource-limited project at work.
The main concern with this setup was that the server instances would create too much load making it impractical. It turns out, understandably, that the activity of the server processes appears to be directly proportional to the level of activity in the cluster, and in my case it’s a fairly stable setup. If you had a system with far more churn it may become an issue. The only churn my cluster gets is when one of the servers fails or needs to be restarted - I don’t currently have any dynamic jobs.
Consul and Vault
Nomad uses the Consul cluster as its storage backend. This ensures that the configuration is distributed and resilient to failures. In a production cluster I would have set up backups of the Consul storage but for my purposes that’s not necessary because there is only runtime state stored there as everything else is in the Terraform files.
Vault is used for secrets but at the time of writing I have no need for secrets of any kind. Vault also uses Consul as its storage backend.
External access and load balancing
The cluster uses Fabiolb to handle routing HTTP and HTTPS requests from the outside world to the correct service. It uses the example job definition provided by Hashicorp on the developer resources website with some minor changes.
Configuration changes
Ports
The first configuration change is to tell Fabio which ports it will be proxying. Currently everything is served on ports 80 and 443 for HTTP and HTTPS respectively. We also tell it to use Consul as the certificate store.
proxy.addr = :80;proto=http,:443;cs=consul
SSL certificate store
We then configure it to use the Consul key/value store as its source for SSL certificates. See the section on Aleff later for how these certificates are populated.
proxy.cs = cs=consul;type=consul;cert=http://127.0.0.1:8500/v1/kv/certs/active
404 page content
Finally we specify a consul key that contains the HTML that Fabio should serve if it cannot match a request to a configured route. Here’s an example (If that give you a website it means I’ve either started using that domain name or I’ve let the registration lapse).
registry.consul.noroutehtmlpath = /fabio/noroute.html
Service registration
To register services with Fabio is pretty simple, but in order to get the functionality I want it’s a little more complex than the basic setup. Primarily I want to redirect HTTP requests to HTTPS, but also redirect www.{domain}/*
to {domain}/*
.
The simplest setup is to specify a tag on the service in the Nomad job definition that starts with urlprefix-
. For example:
tags = [
"urlprefix-stut.dev:80/"
]
This will cause a route for http://stut.dev/
to be registered with this service as the destination. To accomplish my requirements is a little more involved but still pretty straightforward:
tags = [
"urlprefix-stut.dev:80/ redirect=302,https://stut.dev$path",
"urlprefix-stut.dev:443/",
"urlprefix-www.stut.dev/ redirect=302,https://stut.dev$path",
]
The first tag creates the route for http://stut.dev/
with a temporary redirect to the HTTPS version of the same URL.
The second tag creates the route for https://stut.dev/
.
The final tag creates the route for http[s]://www.stut.dev/
with a temporary redirect to the HTTPS version of the URL without the www.
.
SSL certificates
As evidenced by the Fabio configuration, all domains are served on port 443 with HTTPS and therefore require certificates. I originally had this setup so obtaining and renewing certificates was a manual process but that got tedious very quickly. That’s when Aleff was born.
Let’s Encrypt
This wonderful service provides all of the certificates free of charge. They don’t last long but they are supported by all main web browsers and have rapidly become the standard for free SSL certificates where organisation verification is not required.
Aleff
I’m not going to go into great detail on how Aleff works as that’s covered well by the documentation. Suffice to say that the Nomad portion of the documentation was written based on how I’m using it in my cluster.
In essence it’s a service that periodically reads the urlprefix-
tags from Consul and ensures that there are valid certificates in the Consul key/value store. See the Fabio configuration section above for details of that. It will register new certificates as required, and renew them when they’re within a certain window before expiration. The service handles verification automatically, and records metrics that can be used to alert if anything goes wrong.
Next: Nomad Jobs
In the next part I’ll cover the jobs and services that I currently run on this cluster.