The project: one model we'll productionize
Here's the model the whole book wraps in tools. It's deliberately tiny — a sentiment classifier in pure NumPy, no downloads, no GPU — because the star of this book is the tooling, not the model. Whatever you learn here applies unchanged to a giant transformer; the model is just the thing inside the box.
All code is in code/sentiment/ and runs with only NumPy.
What it does
Given a sentence, predict whether it's positive or negative. Under the hood
it's exactly the bag-of-words + logistic regression from the AI Foundations book:
count the words, take a weighted sum, squash with a sigmoid to get P(positive).
The data
We keep a small labeled dataset inline (data.py) so the
project runs anywhere with zero setup — 25 positive and 25 negative sentences that
reuse strong polarity words so a simple model can generalize:
POSITIVE = ["i love this product it is great",
"absolutely love it fantastic and great", ...]
NEGATIVE = ["i hate this product it is terrible",
"absolutely hate it awful and terrible", ...]
In a real project this is the part that comes from a database, a warehouse, or a feature store — and the part you'll version with DVC in Chapter 8.
The model
model.py is a SentimentModel class with the four methods
every production tool in this book needs: fit, predict_proba,
save, load. That save/load pair produces the model artifact — the
file that the registry, the API, and Docker all pass around.
class SentimentModel:
def fit(self, texts, labels): ... # train by gradient descent
def predict_proba(self, texts): ... # -> P(positive) per text
def save(self, path): ... # write the artifact (model.json)
@classmethod
def load(cls, path): ... # read it back
Run it directly to see it learn and predict:
$ python model.py
Output:
train accuracy: 1.0
P(positive)=0.595 <- 'this is wonderful'
P(positive)=0.024 <- 'this is terrible'
P(positive)=0.698 <- 'fast and reliable support'
It correctly scores the positive sentences high and the negative one near zero. (Modest confidence on novel phrasings is expected — it's a 50-example model. That's fine; it's our stand-in for a real model.)
Training & evaluating
train.py does the production-shaped thing: split the data,
train, evaluate on a held-out test set (honest evaluation matters),
and save the artifact. It also has a --mlflow flag we'll use in the next chapter.
$ python train.py
Output:
examples: 50 train=38 test=12
vocab size: 68
train_accuracy: 1.000
test_accuracy: 1.000
saved model -> model.json
100% on the held-out set (the data is clean and separable by design), and it wrote
model.json — the artifact everything downstream consumes.
The artifact: the thing tools move around
That model.json is the heart of every chapter to come. Internally it's just the
learned vocabulary and weights as JSON:
{ "vocab": {"love": 0, "great": 1, ...}, "weights": [1.83, 2.41, ..., -0.12], ... }
Everything from here on is about that file's journey: tracked (MLflow), versioned (registry), wrapped in an API (FastAPI), baked into an image (Docker), produced by a scheduled pipeline (Prefect), converted for speed (ONNX), and watched for rot (monitoring). The model never changes — its operational maturity does.
Why "keep the model trivial" is the right call
When a serving tutorial uses a 2 GB model, half of it is fighting downloads, CUDA,
and memory, and the actual tool gets two paragraphs. By making the model a 5 KB
JSON file, every chapter spends 100% of its energy on the tool you came to learn —
and because the interface (fit/predict/save/load) is identical to a real
model's, nothing you learn is toy-specific.
The takeaway
We have a small, honest, fully-working model and its artifact (model.json),
exposing the universal fit/predict_proba/save/load interface. That's
everything the rest of the book builds on. First stop on the artifact's journey:
making experiments reproducible and comparable with MLflow. 👉