gen_server Type Helpers
Published on Sep 29, 2020 by dix.
Update (October 2, 2020)
After publishing this and trying out some of the type helpers I suggested, I realize that I misunderstood how Dialyzer works. Success typing, which Dialyzer implements, does not actually solve the problems I call out below. I think this only strengthens the argument for the benefits of the work done by Gleam and the WhatsApp team. I published a new post about my misunderstanding which you can find here
At work, we are currently implementing a GraphQL API in Elixir which
fronts and orchestrates several other APIs. Several of the APIs we
interact with use OAuth token flows for authentication. We use the OTP
GenServer1 behaviour to implement a cache for these tokens.
GenServer has worked very well for this use case, but it has proven
particularly useful for the more complicated OAuth refresh token flow,
which has two separate expiration timelines. Clients invoke a
GenServer.call/3
to request a token, and internally the cache uses
Process.send_after/4
to implement the expiration timelines.
Dialyzer Lets Me Down
I recently extended the implementation of these token caches to support
refresh token flows and ran into some trouble. This trouble left me
dissatisfied with the behaviour of Dialyzer for OTP behaviours. Our team
uses Dialyzer to provide type checking as we develop our software.
Dialyzer does not provide a full fledged type system, but the success
typing it provides is useful to detect errors during development.
Additionally, we use Credo to enforce that all public methods in a
module have a type spec. And we use the typed_struct
2 and
typed_ecto_schema
3 libraries to generate type specifications for
our user-defined structs and Ecto schema-defined structs. With these
tools in place, Dialyzer is able to detect many classes of programmer
errors. In particular, it is a useful tool for us during refactoring.
For example, if we update the return type of one function clause without
updating the return type of the other clause, Dialyzer will quickly
detect the error.
As I worked to refactor our GenServer token cache implementation, Dialyzer let me down. This is no fault of Dialyzer; in fact, it’s not really the fault of anything. The callbacks of GenServer all have type specifications, but because GenServer is so generic, the types must be very loose. In particular, GenServer allows any value to be stored as its state. Not only that, each callback can change the type of the state. It is this aspect of the specification that bit me in this refactoring. Previously, our token server used a plain map as its state, but because the refresh token flow is more complicated, I updated the server to use a typed struct as the state. I updated most of the callback function clauses to return the struct as their state, but I missed some. A slightly stricter specification for the callback type would have easily caught this error, i.e. a specification that the type of the state in the return value is the same as the type of the state in the call back arguments.4
Had I not caught this error by luck, it would have caused issues in production. These issues would have perhaps been tricky to pin down, because they would occur not in the incorrectly implemented callback, but rather in the next invocation of any other callback which expected the state to be our struct type. Thankfully, I detected the error by eye and disaster was averted.
Enforcing our Team’s Conventions
These helpers do not actually work, read with a grain of salt.
Based on this experience, I’m planning to introduce some GenServer type helpers into our codebase. By convention, our team will agree that GenServer callbacks should always return the same type as the new state. The type helpers, we implement will enforce this convention. The type helper will look something like this:
defmodule Core.Type.GenServer do @type handle_cast(state) :: (request :: term(), state -> {:noreply, state} | {:noreply, state, timeout() | :hibernate | {:continune, term()}} | {:stop, reason :: term(), state} @type handle_call(state) :: (request :: term(), from(), state -> {:reply, reply, state} | {:reply, reply, state, timeout() | :hibernate | {:continue, term()}} | {:noreply, state} | {:noreply, state, timeout() | :hibernate | {:continue, term()}} | {:stop, reason, reply, state} | {:stop, reason, state}) when reply: term(), reason: term() @type handle_info(state) :: (msg :: :timeout | term(), state -> {:noreply, state} | {:noreply, state, timeout() | :hibernate | {:continue, term()}} | {:stop, reason :: term(), state}) end
These type specs would have caught my errors above, and they could be
extended to catch other categories of error. For example, we could
define stricter types for the request arguments in handle_cast/2
and
handle_call/3
. But there are still a couple of places where Dialyzer
still can’t help us. For example, although we can have stricter types
for the callbacks, these types will be visible when clients call
GenServer.cast/2
and GenServer.call/3
. As it currently stands, I
think this issue is unsolvable. These methods must have very loose
typing because they are called by every client of every GenServer.
Future Improvements
There are a couple of projects in the works that I am hopeful will improve developer experience with these kind of errors. I’m following both with quite a bit of interest. The first is a new programming language targeting the BEAM called Gleam5. Gleam is a statically typed language which takes inspiration from ML family of languages. In addition to providing a robust type system, Gleam hopes to provide a type-safe implementation of OTP.6 A Gleam implementation of GenServer will hopefully be able to make explicit the connections between client code and GenServer callbacks.
The second project is work that the Erlang team within WhatsApp at Facebook is doing to improve Erlang developer experience.7 In fact, the loose specification of GenServer is a shortcoming that they have explicitly identified. The team is currently prototyping a declarative and statically typed GenServer API which they hope to release by the end of 2020. The WhatsApp team has engaged with the Elixir core team, the implementer of Gleam, and other members of the BEAM community. While their improvements are targeted explicitly to Erlang, I’m hopeful that their broad collaboration will lead to improvements that can be useful across BEAM languages.
Footnotes:
It seems like there is a fairly standard pattern of initializing
GenServer’s with a nil
state, this would obviously break that
pattern.