Hot-deploying Phoenix on Fly.io without restarting the world

A while back I watched Chris McCord's demo where he pushes a change to a running Phoenix app in production and it just... takes effect. No restart, no reconnect, the LiveView session sitting right there keeps its state. I filed it under "neat, someday" and moved on.

Then I actually set it up on the app I run, and it's quietly become how I ship small changes. So here's the whole thing: install, config, daily use, and the couple of footguns that cost me an afternoon so they don't cost you one.

Everything below is checked against fly_deploy 0.4.2 (the version I'm running). One heads-up worth its own mention: at the time of writing the library's README lags its own code in a few places — it still documents an older setup function and an old version pin. I'll flag those as I go, but in general trust the moduledoc / @deprecated annotations over the README, and double-check against whatever version you install.

What this actually buys you

A normal deploy on Fly builds a new image and swaps your machines out. The BEAM restarts, in-flight requests get cut, and every connected LiveView reconnects from zero. For most apps that's fine. For a LiveView-heavy one it's a visible little hiccup every single time you ship.

A hot deploy skips all of that. It builds a release, ships the compiled .beam files to the VM that's already running, and the machine loads the modules that changed in place — briefly suspending only the processes that use them (typically under a second). The OS process never dies. Open LiveViews keep their state. From the outside nothing happened, except your fix is live.

So it's a great fit for the boring 95%: a template tweak, some LiveView logic, a context function, a bug fix in a module that already exists. It is not for structural stuff, but I'll get to that.

Getting it running

You'll need an app already on Fly (with fly deploy working) and a bucket to hold the compiled releases. On Fly that's Tigris, and it's one command.

Add the dep:

# mix.exs
{:fly_deploy, "~> 0.4"}

(The README example still shows ~> 0.1.0 — that's one of the stale spots; 0.4.x is current.)

Provision the bucket and set the secrets:

# Creates a Tigris bucket and sets AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY,
# BUCKET_NAME, AWS_ENDPOINT_URL_S3 and AWS_REGION as app secrets.
fly storage create

# The orchestrator needs a token to list machines and trigger the upgrade RPC.
# This one is easy to miss and the hot deploy fails without it.
fly secrets set FLY_API_TOKEN="$(fly tokens create machine-exec)"

Then turn it on in config/runtime.exs. I gate it behind the bucket being present so it stays completely dormant in dev and anywhere the storage isn't wired up:

if System.get_env("BUCKET_NAME") do
  config :fly_deploy,
    bucket: System.get_env("BUCKET_NAME"),
    # Make sure this matches YOUR bucket's endpoint. The library defaults to
    # https://t3.storage.dev; my Tigris bucket lives at fly.storage.tigris.dev,
    # and leaving the default gave me 403s on the orchestrator's deploy lock.
    # `fly storage create` sets AWS_ENDPOINT_URL_S3 — pin it here so the two
    # never drift.
    aws_endpoint_url_s3: System.get_env("AWS_ENDPOINT_URL_S3")

  config :my_app, fly_deploy: true
end

Last piece — start the poller under your supervision tree. Put it at the top of your children:

# lib/my_app/application.ex
defp fly_deploy_children do
  if Application.get_env(:my_app, :fly_deploy, false) do
    [{FlyDeploy, otp_app: :my_app}]
  else
    []
  end
end

and prepend fly_deploy_children() to your children list. The "top" matters: the poller blocks in its init/1 to apply any pending upgrade before the rest of the tree starts, so a machine that restarts (after a crash, scaling, or a deploy) loads the current code before any of your processes come up.

The README tells you to call FlyDeploy.startup_reapply_current/1 from Application.start/2 instead. That function still exists but is @deprecated in 0.4.2 in favor of the {FlyDeploy, otp_app: :my_app} child spec above — a good example of trusting the code over the README.

That's the install. Because it touches mix.exs and the supervision tree, you need one more regular fly deploy to get it onto your machines. After that, you're hot.

Using it day to day

From your laptop:

mix fly_deploy.hot

It builds a release, uploads the tarball, flips the "current" pointer in the bucket, and tells each running machine to load the new code. No restart. If you run more than one environment, point it at the right config:

mix fly_deploy.hot --config fly.staging.toml
mix fly_deploy.hot --dry-run   # see what it'd do first

And to see what's actually running where:

mix fly_deploy.status

I don't really trust a deploy until I've looked, so from a remote console I check:

FlyDeploy.get_current_hot_upgrade_info(:my_app)   # %{hot_upgrade_applied:, deployed_at:, ...}
FlyDeploy.current_vsn()                           # %{base_image_ref:, hot_ref:, version:, fingerprint:}

deployed_at should be a few seconds ago, and the image ref should be the one you expect. You can also subscribe to the lifecycle (FlyDeploy.subscribe/0, then handle {:fly_deploy, :hot_upgrade_complete, meta}) and forward that to Slack or a #deploys channel, which is nicer than remembering to check.

When to reach for it, and when not

The rule of thumb is simple: if your change can be expressed as "load a newer version of this module," hot-deploy it. If it can't, do a normal deploy.

Hot is great for templates and HEEx, LiveView and component code, context and business logic, and bug fixes in existing modules. Skip it and just fly deploy for anything that changes the shape of the system: new or removed deps, runtime config, the supervision tree (the library can't add/remove supervised children at runtime), background-job queues, database migrations, an Elixir or OTP bump, anything with NIFs/ports. And static assets, which I'll come back to.

When I'm not sure, I cold deploy. It's never the wrong answer, just the slower one.

The stuff that actually bit me

The setup is easy. These are the things I learned the annoying way.

The big one: your normal deploys need unique image tags. This cost me most of an afternoon and a genuinely confused debugging session, so let me spell it out.

On boot, fly_deploy re-applies the "current" hot upgrade recorded in the bucket — that's the restart-resilience feature, and it's good. To decide whether a booting machine is the same generation (re-apply the upgrade) or a fresh cold deploy (forget it and run the new image), it compares the machine's image ref (FLY_IMAGE_REF) against the one stored in the bucket metadata. Match → apply the upgrade; mismatch → reset.

If all your cold deploys ship the same tag, the classic :latest, those refs always match. So a brand-new image looks exactly like a restart, and on boot the machine cheerfully re-applies an old hot tarball on top of your fresh deploy, silently reverting your code. The worst part is that your release version still reports the new build while the code actually running is old. I stared at "but I literally just deployed this" for far too long.

The fix is boring: tag every deploy image uniquely (the git SHA works fine) so each cold deploy has a distinct ref and fly_deploy can tell a new generation from a restart. If you only ever deploy with fly deploy from the CLI you're probably already getting a unique image. This mostly bites hand-rolled CI that pushes :latest.

Two smaller ones:

Hot deploys ship .beam files, not your compiled assets. The tarball is built from your release's .beam files; your priv/static bundle (the fingerprinted CSS/JS, images) isn't in it, so changed asset bytes don't ride along. The library does ship a FlyDeploy.Components.hot_reload_css component that nudges connected clients to re-fetch the stylesheet when the asset manifest changes — handy for the CSS case — but for an actual asset rebuild a cold deploy is the reliable path. Markup in your HEEx updates fine.

Your error tracker's release version goes stale. A hot deploy doesn't rebuild the image, so anything stamped in at build time, like a Sentry release or an OTel tag, keeps pointing at the last cold deploy. Errors after a hot deploy get filed under the old version. Not a big deal once you know, but it'll confuse you in the dashboard otherwise.

Worth it?

For me, easily. Shipping a small fix went from a restart-and-watch ritual to something instant and invisible, and not blowing away every connected LiveView on each deploy is genuinely nice. The setup is a dependency, a bucket, a couple of secrets, and one child at the top of your tree. Keep the hot/cold split in your head, give your deploys unique tags, and it mostly disappears into the background, which is exactly what you want from infrastructure.