The production ML lifecycle & the toolbox map

Before any tool, you need the map — the lifecycle every production model lives in, and which tool owns which stage. Once you see the whole loop, each chapter clicks into place as "the tool for this box."

The lifecycle loop

A production model isn't a one-shot script; it's a loop that runs forever:

   ┌─────────────────────────────────────────────────────────────┐
   │                                                              ▼
  DATA ──► TRAIN ──► TRACK ──► REGISTER ──► SERVE ──► MONITOR ──► (drift?)
   ▲         │         │          │           │          │
   │      (Prefect) (MLflow)  (MLflow Reg) (FastAPI)  (drift check)
   │         │                              (Docker)
   └──── retrain when it drifts ◄───────────────────────────────────┘

Read it as a sentence: pull data, train a model, track the experiment, register the winner, serve it behind an API, monitor it in production, and when it drifts, retrain — forever. Software ships once and is done; ML rots, because the world it learned changes. The loop is the whole point, and MLOps is the discipline of automating it.

Two words you'll see everywhere

  • MLOps — "DevOps for machine learning": the practices and tools that take a model from notebook to reliable, monitored production and keep it healthy.
  • The model artifact — the saved, trained model (a file). Everything downstream — registry, serving, Docker — moves this artifact around. In our project it's a model.json; in yours it might be a .pkl, .pt, or .onnx.

Why ML needs more than DevOps

Regular software has one moving part: code. ML has three: code + data + model, and all three drift independently. That's why ML gets its own tools — you must version the data (DVC), track which data made which model (MLflow), and watch the model's quality in production, not just its uptime (monitoring). Keep this "three moving parts" idea in mind; it explains why each tool in this book exists.

The toolbox, mapped to the loop

Lifecycle stageTool in this bookThe category (alternatives)
Orchestrate the loopPrefectworkflow orchestration (Airflow, Dagster)
Version the dataDVCdata versioning (lakeFS, Delta Lake)
Track experimentsMLflowexperiment tracking (Weights & Biases)
Register & version modelsMLflow Registrymodel registry (SageMaker, Vertex)
Serve predictionsFastAPImodel serving (BentoML, TorchServe)
Package to run anywhereDockercontainerization (Podman)
Background / async workCelery + Redistask queues (RQ, Dramatiq, Arq)
Cache & fast lookupsRedisin-memory store (Memcached)
Fast portable inferenceONNXinference runtime (TensorRT, OpenVINO)
Demo UIStreamlitquick UIs (Gradio, Dash)
Monitor & detect driftPSI / Evidentlyobservability (WhyLabs, Arize)
Store & search embeddingsvector storevector DBs (Qdrant, Chroma, pgvector, Pinecone)
Ground an LLM in your docsRAGLangChain, LlamaIndex
Serve / run an LLMClaude API / vLLM / OllamaOpenAI, TGI, Triton
Watch LLM cost & qualitytoken/cost tracking + LLM-judgeLangfuse, LangSmith, Ragas
Test & ship safelypytest + GitHub ActionsCI/CD (GitLab CI, Jenkins)
Config & secretsPydantic Settings + secrets mgrVault, AWS/GCP Secrets Manager
Run it all togetherdocker-composelocal orchestration (Kubernetes at scale)

You're learning one representative from each category. Swap in the alternative at your job and the concepts transfer directly — a task queue is a task queue.

How the chapters run

  • Live (real output): MLflow, FastAPI, ONNX, the drift check, the vector store, RAG retrieval, LLM cost math, the test suite, config loading, and API-key auth are executed in this book — you'll see genuine output, and your runs will match.
  • Follow-along: Docker, Celery, Redis, DVC, Prefect, Streamlit, and the LLM generation calls (which need a daemon, a background service, or an API key) give you the exact install command, the code, the commands to run, and the output to expect. Provide the service/key and they run identically.

Either way, all the code is in code/ and every file runs on its own.

A note on setup

You don't need to install everything up front. Each chapter names its one dependency (pip install mlflow, pip install fastapi uvicorn, …), so install as you go. The only thing the core model needs is NumPy. The full list lives in code/requirements.txt, and code/Makefile has a shortcut for every step.

With the map in hand, let's build the one model we'll spend the rest of the book productionizing. 👉