Docker: package it to run anywhere

Your FastAPI service runs on your laptop, with your Python version and your installed packages. Ship it to a server and it breaks — wrong Python, missing library, different OS. Docker ends "works on my machine" by packaging your code and its entire environment into a portable image that runs identically everywhere: your laptop, a teammate's, a server, the cloud.

Setup: install Docker Desktop (Mac/Windows) or Docker Engine (Linux). This chapter is follow-along — Docker needs a daemon, so commands and expected output are shown for you to run locally.

The three words you must know

Dockerfile — a recipe: the steps to build your environment (base OS, install deps, copy code, run command).
Image — the built, frozen result of that recipe. Immutable, shareable, tagged (sentiment-api:1.0).
Container — a running instance of an image. You can run many containers from one image.

Analogy: the Dockerfile is a class, the image is a compiled program, the container is a running process.

The Dockerfile, line by line

Here's code/Dockerfile, which packages our API. Every line is a real best practice:

FROM python:3.11-slim                 # 1. small base image (fewer CVEs, less weight)

WORKDIR /app

COPY requirements.txt .               # 2. deps FIRST, so Docker caches this layer
RUN pip install --no-cache-dir fastapi "uvicorn[standard]" pydantic numpy

COPY sentiment/ ./sentiment/          # 3. then the code (changes often)
COPY api/ ./api/

RUN useradd --create-home appuser     # 4. don't run as root
USER appuser

EXPOSE 8000

HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
    CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')"

CMD ["uvicorn", "api.main:app", "--host", "0.0.0.0", "--port", "8000"]

The four ideas worth internalizing:

Slim base image — python:3.11-slim is ~5× smaller than the full image: faster pulls, smaller attack surface.
Copy requirements.txt before the code. Docker builds in cached layers; if you copy code first, every code change re-installs all dependencies. Deps-first means pip install is cached and rebuilds are seconds, not minutes.
Run as a non-root user. If the container is compromised, the attacker isn't root. Basic, essential hygiene.
HEALTHCHECK — hits our /health endpoint (Chapter 4) so orchestrators know when the container is actually ready, not just started.

Don't be confused: EXPOSE vs. -p. EXPOSE 8000 only documents that the app uses port 8000 — it doesn't open anything. You actually publish the port at run time with -p 8000:8000 (host port : container port). Forgetting -p is the #1 "why can't I reach my container?" gotcha.

Build and run

cd code
docker build -t sentiment-api .          # build the image from the Dockerfile
docker run -p 8000:8000 sentiment-api    # run a container, publish the port

Expected output:

[+] Building 12.3s (12/12) FINISHED
 => naming to docker.io/library/sentiment-api                          0.0s
...
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:     Application startup complete.

Now curl http://localhost:8000/health works exactly as in Chapter 4 — but it's running inside an isolated container, with its own Python and dependencies, that will behave identically on any machine with Docker.

The commands you'll use daily

docker images                    # list built images
docker ps                        # list running containers
docker logs <container>          # see a container's output
docker exec -it <container> bash # open a shell inside a running container
docker stop <container>          # stop it
docker build -t name:tag .       # build with a version tag
docker push registry/name:tag    # push to a registry (Docker Hub, ECR, GCR)

How images travel to production

You don't copy code to servers anymore — you build an image, push it to a container registry (Docker Hub, AWS ECR, Google Artifact Registry), and servers pull and run it:

build image  ─►  docker push  ─►  registry  ─►  docker pull  ─►  run on server / K8s

This is the foundation of modern deployment. Kubernetes (the full-stack chapter and the foundations book) orchestrates thousands of these containers; cloud "serverless container" services (AWS Fargate, Cloud Run) run them without you managing servers at all.

Don't be confused: image vs. container (again, because it matters). You build an image once and run many containers from it. Stopping a container doesn't delete the image. A container is ephemeral — anything written inside it (like a model.json created at runtime) vanishes when it stops, unless you mount a volume. Bake the model into the image, or load it from the registry/object storage at startup — never rely on files written inside a running container surviving.

A note on size & GPUs

Keep images small: slim bases, .dockerignore, multi-stage builds (build in a fat image, copy only the result into a slim one).
For GPU inference, use NVIDIA's CUDA base images and the NVIDIA container runtime — the same Dockerfile idea, heavier base.

The takeaway

Docker packages your code and its environment into a portable image — killing "works on my machine." Write a Dockerfile (slim base, deps before code for caching, non-root, healthcheck), build an image, run containers from it, and push to a registry so any server can pull and run it identically. Containers are ephemeral; don't rely on files written inside them. Our service is now portable — next, let's handle work that's too slow to do inside a request, with a task queue. 👉

Production ML & AI Tools: A Hands-On Field Guide