The project: one model we'll productionize

Here's the model the whole book wraps in tools. It's deliberately tiny — a sentiment classifier in pure NumPy, no downloads, no GPU — because the star of this book is the tooling, not the model. Whatever you learn here applies unchanged to a giant transformer; the model is just the thing inside the box.

All code is in code/sentiment/ and runs with only NumPy.

What it does

Given a sentence, predict whether it's positive or negative. Under the hood it's exactly the bag-of-words + logistic regression from the AI Foundations book: count the words, take a weighted sum, squash with a sigmoid to get P(positive).

The data

We keep a small labeled dataset inline (data.py) so the project runs anywhere with zero setup — 25 positive and 25 negative sentences that reuse strong polarity words so a simple model can generalize:

POSITIVE = ["i love this product it is great",
            "absolutely love it fantastic and great", ...]
NEGATIVE = ["i hate this product it is terrible",
            "absolutely hate it awful and terrible", ...]

In a real project this is the part that comes from a database, a warehouse, or a feature store — and the part you'll version with DVC in Chapter 8.

The model

model.py is a SentimentModel class with the four methods every production tool in this book needs: fit, predict_proba, save, load. That save/load pair produces the model artifact — the file that the registry, the API, and Docker all pass around.

class SentimentModel:
    def fit(self, texts, labels): ...          # train by gradient descent
    def predict_proba(self, texts): ...        # -> P(positive) per text
    def save(self, path): ...                  # write the artifact (model.json)
    @classmethod
    def load(cls, path): ...                   # read it back

Run it directly to see it learn and predict:

$ python model.py

Output:

train accuracy: 1.0
  P(positive)=0.595  <- 'this is wonderful'
  P(positive)=0.024  <- 'this is terrible'
  P(positive)=0.698  <- 'fast and reliable support'

It correctly scores the positive sentences high and the negative one near zero. (Modest confidence on novel phrasings is expected — it's a 50-example model. That's fine; it's our stand-in for a real model.)

Training & evaluating

train.py does the production-shaped thing: split the data, train, evaluate on a held-out test set (honest evaluation matters), and save the artifact. It also has a --mlflow flag we'll use in the next chapter.

$ python train.py

Output:

examples: 50  train=38  test=12
vocab size: 68
train_accuracy: 1.000
test_accuracy:  1.000
saved model -> model.json

100% on the held-out set (the data is clean and separable by design), and it wrote model.json — the artifact everything downstream consumes.

The artifact: the thing tools move around

That model.json is the heart of every chapter to come. Internally it's just the learned vocabulary and weights as JSON:

{ "vocab": {"love": 0, "great": 1, ...}, "weights": [1.83, 2.41, ..., -0.12], ... }

Everything from here on is about that file's journey: tracked (MLflow), versioned (registry), wrapped in an API (FastAPI), baked into an image (Docker), produced by a scheduled pipeline (Prefect), converted for speed (ONNX), and watched for rot (monitoring). The model never changes — its operational maturity does.

Why "keep the model trivial" is the right call

When a serving tutorial uses a 2 GB model, half of it is fighting downloads, CUDA, and memory, and the actual tool gets two paragraphs. By making the model a 5 KB JSON file, every chapter spends 100% of its energy on the tool you came to learn — and because the interface (fit/predict/save/load) is identical to a real model's, nothing you learn is toy-specific.

The takeaway

We have a small, honest, fully-working model and its artifact (model.json), exposing the universal fit/predict_proba/save/load interface. That's everything the rest of the book builds on. First stop on the artifact's journey: making experiments reproducible and comparable with MLflow. 👉