Introduction
No prior tooling experience required. If you can train a model in a Python script, you're ready. This book teaches the production tools — the ones every job description lists and every ML team uses — by adding them, one at a time, around a single tiny model. Every chapter shows the exact commands to run and the output you'll see.
The gap this book fills
Most ML learning stops at "the model works in my notebook." But a model in a notebook helps no one. The leap from "it works on my machine" to "it serves a million requests, retrains nightly, and pages me when it drifts" is a different skill set — MLOps tooling — and it's rarely taught coherently. You end up googling "how do I deploy a model" and drowning in fifty disconnected tutorials.
This book is the coherent path. It picks the production-ready, industry-standard tools and teaches them in the order you'd actually adopt them, each solving a real problem the previous step created.
The one idea: one model, many wrappers
We train one tiny sentiment classifier in Chapter 1 — bag-of-words + logistic regression, pure NumPy, no downloads — and then never change it. Instead, every chapter wraps that same model in another production tool:
track it (MLflow) ─► serve it (FastAPI) ─► package it (Docker)
─► scale it (Celery+Redis) ─► version its data (DVC)
─► orchestrate retraining (Prefect) ─► optimize it (ONNX)
─► demo it (Streamlit) ─► watch it (monitoring)
─► run the whole stack (docker-compose)
Keeping the model trivial means all your attention goes to the tool — which is the thing you're actually here to learn.
The tools, and why each was chosen
| Tool | The problem it solves | Chapter |
|---|---|---|
| MLflow | "which experiment was best, and where's that model?" | 2–3 |
| FastAPI | "how do other systems call my model?" | 4 |
| Docker | "it works on my machine but nowhere else" | 5 |
| Celery + Redis | "this job is too slow to run in a request" | 6–7 |
| DVC | "which data trained this model?" | 8 |
| Prefect | "retrain every night, automatically" | 9 |
| ONNX | "make inference fast and framework-free" | 10 |
| Streamlit | "let a non-engineer try the model" | 11 |
| Monitoring | "is the model still working in production?" | 12 |
| Vector DBs + RAG | "answer from my own documents with an LLM" | 14–15 |
| LLM serving + observability | "call/run an LLM, and watch its cost & quality" | 16–17 |
| Testing + CI/CD | "prove it works on every change, automatically" | 18 |
| Config & security | "no leaked keys; lock the API" | 19 |
| docker-compose | "run the whole system with one command" | 20 |
These aren't arbitrary — they're the default, battle-tested choices at real companies. Learn these and you can read almost any ML team's stack.
How to read it
Chapters build on each other, so read in order the first time. Some tools (MLflow,
FastAPI, ONNX, monitoring) are installed and run live in this book — you'll see real
output. Others (Docker, Celery, Redis, DVC, Prefect, Streamlit) need background
services or a daemon, so those chapters are precise follow-along tutorials: the
exact install commands, the code, the commands to run, and the output to expect. All
the code lives in code/ and runs standalone.
What you'll be able to do by the end
Take any model and: track its experiments, register and version it, wrap it in a production API, containerize it, offload heavy work to a queue, cache results, version its data, schedule its retraining, export it for fast portable inference, build a demo UI, and monitor it for drift — then run all of it together with one command. That's the full production lifecycle, and it's exactly what "MLOps" means on a job description.
Ready? Let's meet the lifecycle and the toolbox map. 👉