Dockerfile Hacks for Elixir Umbrella Apps
Published on Sep 17, 2020 by dix.
At my new job, I’m working with a team with a ton of experience maintaining and extending a Ruby on Rails monolith. Based on our experience running a large monolith, we appreciated the ease of use of having a single deployment artifact, but disliked the total lack of boundaries enforced in a Rails monolith. We were also influenced by the Modular Monolith1 approach followed by former colleagues of ours now working at Root. As we begin work on a new application in Elixir, we chose to implement our services within an Elixir umbrella app. This allowed us to have a single deployment artifact while enforcing some boundaries within our code.2
In addition to implementing our service using an Elixir umbrella app, we are using Buildkite to continuously deploy this service to Amazon ECS. We test, type check, lint, containerize, and deploy our service through Buildkite. Each of these steps occurs in docker. Below is a list of some tips and tricks we’ve found while working in this way with our umbrella app and docker.
Docker Caching
It is a best practice to organize a Dockerfile
to optimize for
caching. In practice this means, placing the steps that are least likely
change as early as possible in the Dockerfile
. For example, you should
install packages first, then your application’s library dependencies,
and finally building you application. To achieve this in our umbrella
app, we explicitly copy over the mix.exs
file from each of our
applications into the container. After all the mix.exs
files are
copied into the container, we run
mix deps.get --skip-umbrella-children
FROM elixir:1.10.4 AS build # prepare build dir WORKDIR /app # install hex + rebar RUN mix local.hex --force && mix local.rebar --force ARG MIX_ENV=test ENV MIX_ENV=${MIX_ENV} # install mix dependencies # only copy mix files to make better use of docker caching COPY apps/broadcast/mix.exs apps/broadcast/ COPY apps/release_tasks/mix.exs apps/release_tasks/ COPY apps/xml/mix.exs apps/xml/ COPY mix.exs mix.lock ./ RUN mix do deps.get, deps.compile --skip-umbrella-children
Because we only copy over the mix.exs
file from each application
before compiling our dependencies, we only refetch and recompile
dependencies when they change. With these changes, most of our builds
hit the docker cache rather than rebuilding dependencies.
Finally, in order to maximize docker cache hits and minimize docker
context size, it is important to exclude the files you need from
docker
and to only copy over the files that are needed to run you
tests or build you release. For example, rather than the naive COPY .
,
just COPY apps apps
, COPY config config
, and COPY rel rel
. If you
copy over your entire working directory, changes to your documentation
could cause docker cache misses.
The .dockerignore
file is another useful tool that can improve both
your docker cache hit rate as well as your docker context size. The
.dockerignore
allows you to specify files in your working directory
which you never want to include in your dockerfile
. This decreases the
size of the docker context which is used in building your containers,
and prevents you from including files in your containers which you don’t
need, thus reducing the number of files that can change and cause you to
miss the docker cache.
Caching PLT
As mentioned above, we are doing “type checking” using Dialyzer in our
application. Dialyzer is not a true type system, but it is a static
analysis tool that helps find type errors, unreachable code, and other
programmer errors. Dialyzer is infamous for taking a very long time to
generate its Persistent Lookup Table or plt
. For this reason, it’s
important that we not regenerate the plt
on every run of Dialyzer in
CI. To prevent this, we followed a similar approach to this one
described by dnsimple3, but adapted for our CI server and our use
of Docker. We update our Dialyzer configuration to look in our
applications priv
directory for pre-built plt
files, we cache our
plt
files using Buildkite caching, and then we attach the location of
the cache to our docker container as a volume.
Elixir Releases and Docker
Erlang and by extension Elixir support building releases from your application. A release is a stand-alone executable that bundles all your applications runtime dependencies, leaving you with an artifact that you can run with no language specific installation. To build the release, you do of course need the language runtime. docker provides multi-stage builds for exactly this sort of use case.
For our deploy process, we build a release on the stock elixir:1.10.4
image, taking advantage of course of all the techniques above to speed
up our docker build
and reduce the size of our containers. We then run
a second stage, which copies over the built release into a stock
debian:buster
container, and we are ready to deploy that image.
Further Notes
I suspect that some people might think this level of effort to get small amounts of improvement out of a CI and CD process is unwarranted. They might turn to Donald Knuth’s warning that “premature optimization is the root of all evil”. To that group I offer the following two points. The first: a few sentences later in the same paper, Knuth says “Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, they will be wise to look carefully at the critical code; but only after that code has been identified.”4 I believe that there is no more critical piece of code than our CI and CD pipelines. Any efficiency we can wring from it, we must.
The second argument is one which I recently came across in a post about using Bazel to build Haskell projects.5 The author discusses their discomfort with the idea that we as an industry are burning power needlessly rebuilding the same pieces of software over and over again in a world that is being turned into a hellscape by our consumption of natural resources and releasing of carbon into the atmosphere. This suggests that we as engineers do have some moral calling to not endlessly execute the same CI pipelines over and over again if we don’t need to.6
Useful Resources
Footnotes:
To be fair, the enforcing of boundaries in an Umbrella app is not perfect. You can access indirect dependencies directly, which does not fully enforce modularity. However, the next release of Elixir is intended to tighten this up and improve support for incremental compilation. Both these changes should drastically improve the life of an umbrella app developer. Elixir 1.11.0 Changelog
The author themself points out, and I agree, that more is needed than just engineers reducing their build times to combat climate change. 100 companies are responsible for 71 percent of all carbon emissions since 1988.(The Carbon Majors Database) Efforts to combat climate change and limit carbon emissions will require mass political action rather than individual choices. This being said, we should still do what we can, particularly when it is to our benefit in other ways.