The production ML lifecycle & the toolbox map

Before any tool, you need the map — the lifecycle every production model lives in, and which tool owns which stage. Once you see the whole loop, each chapter clicks into place as "the tool for this box."

The lifecycle loop

A production model isn't a one-shot script; it's a loop that runs forever:

   ┌─────────────────────────────────────────────────────────────┐
   │                                                              ▼
  DATA ──► TRAIN ──► TRACK ──► REGISTER ──► SERVE ──► MONITOR ──► (drift?)
   ▲         │         │          │           │          │
   │      (Prefect) (MLflow)  (MLflow Reg) (FastAPI)  (drift check)
   │         │                              (Docker)
   └──── retrain when it drifts ◄───────────────────────────────────┘

Read it as a sentence: pull data, train a model, track the experiment, register the winner, serve it behind an API, monitor it in production, and when it drifts, retrain — forever. Software ships once and is done; ML rots, because the world it learned changes. The loop is the whole point, and MLOps is the discipline of automating it.

Two words you'll see everywhere

MLOps — "DevOps for machine learning": the practices and tools that take a model from notebook to reliable, monitored production and keep it healthy.
The model artifact — the saved, trained model (a file). Everything downstream — registry, serving, Docker — moves this artifact around. In our project it's a model.json; in yours it might be a .pkl, .pt, or .onnx.

Why ML needs more than DevOps

Regular software has one moving part: code. ML has three: code + data + model, and all three drift independently. That's why ML gets its own tools — you must version the data (DVC), track which data made which model (MLflow), and watch the model's quality in production, not just its uptime (monitoring). Keep this "three moving parts" idea in mind; it explains why each tool in this book exists.

The toolbox, mapped to the loop

Lifecycle stage	Tool in this book	The category (alternatives)
Orchestrate the loop	Prefect	workflow orchestration (Airflow, Dagster)
Version the data	DVC	data versioning (lakeFS, Delta Lake)
Track experiments	MLflow	experiment tracking (Weights & Biases)
Register & version models	MLflow Registry	model registry (SageMaker, Vertex)
Serve predictions	FastAPI	model serving (BentoML, TorchServe)
Package to run anywhere	Docker	containerization (Podman)
Background / async work	Celery + Redis	task queues (RQ, Dramatiq, Arq)
Cache & fast lookups	Redis	in-memory store (Memcached)
Fast portable inference	ONNX	inference runtime (TensorRT, OpenVINO)
Demo UI	Streamlit	quick UIs (Gradio, Dash)
Monitor & detect drift	PSI / Evidently	observability (WhyLabs, Arize)
Store & search embeddings	vector store	vector DBs (Qdrant, Chroma, pgvector, Pinecone)
Ground an LLM in your docs	RAG	LangChain, LlamaIndex
Serve / run an LLM	Claude API / vLLM / Ollama	OpenAI, TGI, Triton
Watch LLM cost & quality	token/cost tracking + LLM-judge	Langfuse, LangSmith, Ragas
Test & ship safely	pytest + GitHub Actions	CI/CD (GitLab CI, Jenkins)
Config & secrets	Pydantic Settings + secrets mgr	Vault, AWS/GCP Secrets Manager
Run it all together	docker-compose	local orchestration (Kubernetes at scale)

You're learning one representative from each category. Swap in the alternative at your job and the concepts transfer directly — a task queue is a task queue.

How the chapters run

Live (real output): MLflow, FastAPI, ONNX, the drift check, the vector store, RAG retrieval, LLM cost math, the test suite, config loading, and API-key auth are executed in this book — you'll see genuine output, and your runs will match.
Follow-along: Docker, Celery, Redis, DVC, Prefect, Streamlit, and the LLM generation calls (which need a daemon, a background service, or an API key) give you the exact install command, the code, the commands to run, and the output to expect. Provide the service/key and they run identically.

Either way, all the code is in code/ and every file runs on its own.

A note on setup

You don't need to install everything up front. Each chapter names its one dependency (pip install mlflow, pip install fastapi uvicorn, …), so install as you go. The only thing the core model needs is NumPy. The full list lives in code/requirements.txt, and code/Makefile has a shortcut for every step.

With the map in hand, let's build the one model we'll spend the rest of the book productionizing. 👉

Production ML & AI Tools: A Hands-On Field Guide