The full stack with docker-compose
You've wrapped one model in a dozen tools, one chapter at a time. In production they run together: the API, the worker, Redis, and the tracking server, all at once, talking to each other. Starting four services by hand in four terminals is painful and fragile. docker-compose defines the whole system in one file and launches it with one command.
Setup: Docker with Compose (bundled in Docker Desktop). Follow-along.
The whole system, declared
docker-compose.yml describes every service and how they
connect:
services:
redis: # broker + cache
image: redis:7-alpine
ports: ["6379:6379"]
api: # the FastAPI model service
build: .
ports: ["8000:8000"]
environment: [REDIS_URL=redis://redis:6379/0]
depends_on: [redis]
worker: # Celery background worker
build: .
command: celery -A tasks.celery_app worker --loglevel=info
environment: [REDIS_URL=redis://redis:6379/0]
depends_on: [redis]
mlflow: # experiment tracking / registry UI
image: ghcr.io/mlflow/mlflow:latest
command: mlflow server --host 0.0.0.0 --port 5000 --backend-store-uri sqlite:////mlflow/mlflow.db
ports: ["5000:5000"]
volumes: [mlflow-data:/mlflow]
volumes:
mlflow-data:
Four services, each from earlier chapters, now wired into one system. Read it top to bottom and you can see the whole architecture at a glance — which is itself a benefit.
One command to rule them all
cd code
docker compose up --build
Expected output:
[+] Running 5/5
✔ Network code_default Created
✔ Container code-redis-1 Started
✔ Container code-mlflow-1 Started
✔ Container code-api-1 Started
✔ Container code-worker-1 Started
api-1 | INFO: Uvicorn running on http://0.0.0.0:8000
worker-1 | celery@... ready.
mlflow-1 | Listening at: http://0.0.0.0:5000
The entire stack is live:
- http://localhost:8000/docs — the prediction API
- http://localhost:5000 — the MLflow UI
- the worker consuming background jobs from Redis
- Redis brokering and caching
Stop it all with one command:
docker compose down
How the services find each other
Notice REDIS_URL=redis://redis:6379/0 — the API reaches Redis by the service name
redis, not an IP. Compose creates a private network where each service is reachable by
its name. This is the key idea: services address each other by name, not address,
so the same compose file works on any machine without editing IPs.
Don't be confused:
depends_onwaits for start, not ready.depends_on: [redis]makes Compose start Redis before the API container — but it doesn't wait for Redis to be accepting connections. A service that crashes because its dependency isn't ready yet needs a real healthcheck (like the one in our Dockerfile) plus retry-on-connect logic. "Started" ≠ "ready" is a classic compose gotcha.
docker-compose vs. Kubernetes
Don't be confused: compose vs. Kubernetes. docker-compose runs multiple containers on one machine — perfect for local development, CI, and small deployments. Kubernetes (K8s) runs containers across a cluster of machines with auto-scaling, self-healing, rolling updates, and load balancing — the standard for production at scale. The good news: a compose file maps conceptually onto K8s manifests, so what you learn here transfers. Start with compose; graduate to K8s when one machine isn't enough.
The complete lifecycle, assembled
Step back and look at what you've built across the book — the entire production loop from Chapter 0, now real:
data ─► train ─► track (MLflow) ─► register ─► serve (FastAPI) ─► package (Docker)
─► scale (Celery+Redis) ─► version data (DVC) ─► orchestrate (Prefect)
─► optimize (ONNX) ─► demo (Streamlit) ─► monitor ─┐
▲ │
└──────────────── retrain on drift ◄────────────────┘
GenAI stack: vector DB ─► RAG service ─► LLM serving ─► LLM observability
Engineering: testing & CI/CD · config, secrets & security
… and run it all with one command (docker-compose)
Every box is a tool you can now use. Swap our tiny model for a real one and nothing about the tooling changes — that was the whole point of keeping the model trivial.
A production-readiness checklist
Before any model goes live, walk this list (each item maps to a chapter):
- Experiments tracked and reproducible (MLflow, DVC)
-
Model versioned in a registry with a
@productionalias -
Served behind a validated API with a
/healthcheck - Containerized; image in a registry; runs as non-root
- Heavy work offloaded to a queue; hot paths cached
- Retraining orchestrated and gated on quality
- Inference optimized (ONNX/quantization) if latency matters
- Monitoring for drift and operational metrics, with alerts
- Tests passing in CI; deploys gated on green (Chapter 18)
- Secrets out of code; API authenticated & rate-limited (Chapter 19)
- For LLM features: cost/latency tracked, eval gate, grounded RAG (Chapter 17)
- A rollback plan (move the alias back)
The takeaway
docker-compose declares your whole multi-service system — API, worker, Redis, MLflow —
in one file and launches it with docker compose up; services find each other by name
on a private network, and you graduate to Kubernetes when one machine isn't enough.
You've now assembled the complete production loop: track, register, serve, package,
scale, version, orchestrate, optimize, demo, monitor, and retrain. That's MLOps — and
you can do it. Go ship something. 👉