UP | HOME
dix

Software Curmudgeon

Mumblings and Grumblings About Software

Dockerfile Hacks for Elixir Umbrella Apps
Published on Sep 17, 2020 by dix.

At my new job, I’m working with a team with a ton of experience maintaining and extending a Ruby on Rails monolith. Based on our experience running a large monolith, we appreciated the ease of use of having a single deployment artifact, but disliked the total lack of boundaries enforced in a Rails monolith. We were also influenced by the Modular Monolith1 approach followed by former colleagues of ours now working at Root. As we begin work on a new application in Elixir, we chose to implement our services within an Elixir umbrella app. This allowed us to have a single deployment artifact while enforcing some boundaries within our code.2

In addition to implementing our service using an Elixir umbrella app, we are using Buildkite to continuously deploy this service to Amazon ECS. We test, type check, lint, containerize, and deploy our service through Buildkite. Each of these steps occurs in docker. Below is a list of some tips and tricks we’ve found while working in this way with our umbrella app and docker.

Docker Caching

It is a best practice to organize a Dockerfile to optimize for caching. In practice this means, placing the steps that are least likely change as early as possible in the Dockerfile. For example, you should install packages first, then your application’s library dependencies, and finally building you application. To achieve this in our umbrella app, we explicitly copy over the mix.exs file from each of our applications into the container. After all the mix.exs files are copied into the container, we run mix deps.get --skip-umbrella-children

FROM elixir:1.10.4 AS build

# prepare build dir
WORKDIR /app

# install hex + rebar
RUN mix local.hex --force && mix local.rebar --force

ARG MIX_ENV=test
ENV MIX_ENV=${MIX_ENV}

# install mix dependencies
# only copy mix files to make better use of docker caching
COPY apps/broadcast/mix.exs apps/broadcast/
COPY apps/release_tasks/mix.exs apps/release_tasks/
COPY apps/xml/mix.exs apps/xml/

COPY mix.exs mix.lock ./
RUN mix do deps.get, deps.compile --skip-umbrella-children

Because we only copy over the mix.exs file from each application before compiling our dependencies, we only refetch and recompile dependencies when they change. With these changes, most of our builds hit the docker cache rather than rebuilding dependencies.

Finally, in order to maximize docker cache hits and minimize docker context size, it is important to exclude the files you need from docker and to only copy over the files that are needed to run you tests or build you release. For example, rather than the naive COPY ., just COPY apps apps, COPY config config, and COPY rel rel. If you copy over your entire working directory, changes to your documentation could cause docker cache misses.

The .dockerignore file is another useful tool that can improve both your docker cache hit rate as well as your docker context size. The .dockerignore allows you to specify files in your working directory which you never want to include in your dockerfile. This decreases the size of the docker context which is used in building your containers, and prevents you from including files in your containers which you don’t need, thus reducing the number of files that can change and cause you to miss the docker cache.

Caching PLT

As mentioned above, we are doing “type checking” using Dialyzer in our application. Dialyzer is not a true type system, but it is a static analysis tool that helps find type errors, unreachable code, and other programmer errors. Dialyzer is infamous for taking a very long time to generate its Persistent Lookup Table or plt. For this reason, it’s important that we not regenerate the plt on every run of Dialyzer in CI. To prevent this, we followed a similar approach to this one described by dnsimple3, but adapted for our CI server and our use of Docker. We update our Dialyzer configuration to look in our applications priv directory for pre-built plt files, we cache our plt files using Buildkite caching, and then we attach the location of the cache to our docker container as a volume.

Elixir Releases and Docker

Erlang and by extension Elixir support building releases from your application. A release is a stand-alone executable that bundles all your applications runtime dependencies, leaving you with an artifact that you can run with no language specific installation. To build the release, you do of course need the language runtime. docker provides multi-stage builds for exactly this sort of use case.

For our deploy process, we build a release on the stock elixir:1.10.4 image, taking advantage of course of all the techniques above to speed up our docker build and reduce the size of our containers. We then run a second stage, which copies over the built release into a stock debian:buster container, and we are ready to deploy that image.

Further Notes

I suspect that some people might think this level of effort to get small amounts of improvement out of a CI and CD process is unwarranted. They might turn to Donald Knuth’s warning that “premature optimization is the root of all evil”. To that group I offer the following two points. The first: a few sentences later in the same paper, Knuth says “Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, they will be wise to look carefully at the critical code; but only after that code has been identified.”4 I believe that there is no more critical piece of code than our CI and CD pipelines. Any efficiency we can wring from it, we must.

The second argument is one which I recently came across in a post about using Bazel to build Haskell projects.5 The author discusses their discomfort with the idea that we as an industry are burning power needlessly rebuilding the same pieces of software over and over again in a world that is being turned into a hellscape by our consumption of natural resources and releasing of carbon into the atmosphere. This suggests that we as engineers do have some moral calling to not endlessly execute the same CI pipelines over and over again if we don’t need to.6

Useful Resources

Footnotes:

2

To be fair, the enforcing of boundaries in an Umbrella app is not perfect. You can access indirect dependencies directly, which does not fully enforce modularity. However, the next release of Elixir is intended to tighten this up and improve support for incremental compilation. Both these changes should drastically improve the life of an umbrella app developer. Elixir 1.11.0 Changelog

6

The author themself points out, and I agree, that more is needed than just engineers reducing their build times to combat climate change. 100 companies are responsible for 71 percent of all carbon emissions since 1988.(The Carbon Majors Database) Efforts to combat climate change and limit carbon emissions will require mass political action rather than individual choices. This being said, we should still do what we can, particularly when it is to our benefit in other ways.