Now that I’m funemployed and have some time on my hands1 I decided to solve a problem that has been bothering me.
See, historically I have published my blog in the following way:
- Write a post in NeoVim over SSH on our home server.
- Run Hugo on that home server, overriding the base URL to point to the home server’s domain.
- Preview the blog post by navigating to the home server’s domain.
- Re-run Hugo with the actual domain name as the base URL.
- Run Packer, which installs Nginx into a Google Cloud Platform VM, puts all the files generated by Hugo in the right places, and bakes the whole thing into a VM.2
- Run Terraform, which notices the new image, generates a new instance template, and updates the managed instance group running my blog to roll out the new images.
This approach has the wonderful property of never waking me up in the middle of the night and I never forget how to do it just because I haven’t published a blog post in like a year. And if I managed to break anything, I could just delete the new image and roll out the previous one to get things back the way they were, no sweat. But on the other hand, I sometimes forgot to run step 43 and broke the assets and links horribly. And steps 5 and 6 took fifteen minutes to run by themselves. And that was annoying.
I started wondering if there was a better way.
And I don’t know that it’s “better”, but I did land on a very cursed way to keep some of those properties that doesn’t involve pushing to prod from my laptop.
Here’s the new system:
- Write a post in NeoVim over SSH on our home server.
- Commit that post with git and push it to the git repo.
- Google Cloud Build runs Hugo for me and builds a Docker image of the Nginx server and the files for my blog.4
- That image gets pushed to Google Cloud Artifact Registry.
- The instances in my managed instance group notice that a new Docker image is available and pull it down, then spin up new containers with that image, register them with HAProxy, drain traffic from the existing containers in HAProxy, and kill the existing containers.
This whole process5 takes between one and five minutes, total, and is all kicked off by the git push.
The interesting thing here is step 5. I kinda glossed over how that happens, because the answer is “I wrote cursed software that does it”. I built a Go binary that, every minute, uses the HAProxy native client and the Docker SDK to compare the running Docker containers to the registered HAProxy servers. Any server without an associated running container gets removed from the backend. Any container missing a server gets added to the backend. And every minute, it pulls all the images associated with the running container using the image ref that was used to start the container. If the image version doesn’t match the one the container is running, it spins up a new container, adds it to HAProxy, sets the old container to drain connections in HAProxy, removes the old container from HAProxy, and kills the old container.
It’s a truly cursed piece of software.
Because I’m me, there are several pieces of this that I’m unhappy with:
- Because I’m insistent on not making a config file or database to hold state, the only way to change the number of containers or the image refs they point to is to do a whole new Packer build and Terraform rollout, like the old system. Because the running containers are considered to be the definition of what’s supposed to be running.
- If a new image rollout fails halfway through, I could accidentally increase the number of containers running; the program has no idea how many containers are supposed to be running, so if the new one starts up and the program gets interrupted before it brings the old one down, it will stand up another new container for the old one on the next run.
- If containers die and Docker’s restart policy fails me somehow, the program doesn’t know to start more containers because it doesn’t know how many should be run.
- There’s a very delicate mapping of the Docker image ref and the port exposed inside the container to a HAProxy backend name that ties the running container to the HAProxy backend they should be added to. This is fragile and annoying but necessary because no other state contains that information.
But it’s good enough to publish this post, and that was the spec! I think the next part of this is investigating having the instance group manager in GCP automatically roll the instances every day or something, just to hide any sins that creep in while this lurching horror runs over time.
Some day I may make the version of this that stores state outside of “whatever containers are running right now”, but that is a whole project, and I have enough projects for the moment.
You may be expecting this blog post to end with a link to a GitHub repo for this cursed program. It will not be. I generally do like to open source the code I write, so nobody needs to write the same code twice. I like having nice things, and think we deserve nice things, and if I take the time to make a nice thing I want as many people as possible to benefit from that nice thing.
But this is not a nice thing. This is A Mistake™. And if you want to make this particular mistake, I think it’s probably good for you to go through the steps of making it yourself, instead of just downloading my mistake.
Let’s see if this works. 🤞
Footnotes
- 1 I do not, in fact, have some time on my hands, everything I say in this post notwithstanding. Toddlers, man. ↩⨂
- 2 It actually does a lot of other things, too, mostly grabbing other HTML files for other sites and grabbing some binaries. I host a lot of sites on a single VM because I’m cheap. ↩⨂
- 3 I eventually wrote a shell script to detect when I had done this and fail the Packer build after, and this is true, my interviewer brought up my broken site while I was in an interview. ↩⨂
- 4 Notably, there are no other sites hosted in that image. Each site can have its own Docker image. ↩⨂
- 5 Except for the blog post writing part. That part takes longer than I care to think about. ↩⨂