The interview playbook
You now hold the whole foundation. This final chapter turns it into interview success: the question types you'll face, a rapid-fire concept bank with crisp answers, the coding drills that recur, and a prep plan. Treat it as the checklist before you walk in.
The five rounds of an ML interview
| Round | What it tests | Where this book prepared you |
|---|---|---|
| Coding (DS&A) | general programming | (LeetCode — outside this book) |
| ML coding | implement an algorithm from scratch | Chapters 1, 8, 11, 18, 20–23 |
| ML concepts | breadth & depth of fundamentals | the whole book |
| ML system design | end-to-end system thinking | Chapter 28 |
| Behavioral | collaboration, impact, judgment | (your stories) |
Most candidates over-prepare DS&A and under-prepare ML concepts and system design — the rounds that actually differentiate. Invest where this book points.
Rapid-fire concept bank
Practice saying each answer out loud in 30–60 seconds. If any feels shaky, reread the linked chapter.
Fundamentals
- Bias–variance tradeoff? Underfitting (high bias) vs. overfitting (high variance); total error balances both. Ch 9
- Overfitting — detect and fix? Train ≪ validation error; fix with more data, regularization, dropout, early stopping, simpler model. Ch 9
- L1 vs. L2? L1 → sparse (feature selection); L2 → smooth shrinkage (weight decay). Ch 9
- Generative vs. discriminative? Model P(x,y) vs. P(y|x). Naive Bayes vs. logistic regression.
Algorithms
- Bagging vs. boosting? Parallel independent trees averaged (↓variance) vs. sequential error-correcting trees (↓bias). Ch 20
- Why do gradient-boosted trees beat neural nets on tabular data? Handle mixed features, need little tuning, capture interactions, robust. Ch 20
- k-NN vs. k-means? Supervised classification (k voters) vs. unsupervised clustering (k groups). Ch 20, Ch 21
- How does a decision tree choose splits? Maximize impurity reduction (Gini/entropy). Ch 20
Deep learning
- Why activation functions? Without non-linearity, stacked layers collapse to one linear layer. Ch 11
- Vanishing gradients — cause and fix? Sigmoid/tanh saturate; fix with ReLU, residual connections, normalization, good init. Ch 11
- What is attention?
softmax(QKᵀ/√d)V— each token weights every other by query–key similarity. Ch 13 - Adam vs. SGD? Adam adapts a per-parameter learning rate + momentum; robust default. Ch 8
Stats & evaluation
- Explain a p-value. P(data this extreme | null true); not P(null true), and not an effect size. Ch 22
- Precision vs. recall — which when? Costly false positives → precision (spam); costly false negatives → recall (cancer). Ch 10
- Why is accuracy bad for imbalanced data? "Always predict majority" scores high yet is useless; use F1/AUC. Ch 10
- What is ROC-AUC? Threshold-free ranking quality; P(score(pos) > score(neg)). Ch 10
- The base-rate fallacy? A rare positive + a "99% accurate" test still yields mostly false positives. Ch 22
LLMs & modern
- RAG vs. fine-tuning? RAG adds knowledge (retrieved at query time); fine-tuning adds behavior (baked into weights). Ch 15
- What is LoRA? Low-rank weight updates — fine-tune <1% of parameters cheaply. Ch 27
- What does temperature do? Scales randomness of token sampling; low = focused, high = creative. Ch 15
- Why do LLMs hallucinate? They optimize plausible next tokens, not truth; mitigate with RAG and verification. Ch 15
Coding drills (implement from scratch, no libraries)
These come up in ML-coding rounds. You've already built most of them in this book — redo them on a blank page until fluent:
- Gradient descent for linear/logistic regression. Ch 1, Ch 6
- k-means clustering. Ch 21
- k-NN classifier. Ch 20
- Backprop for a 2-layer net. Ch 11
- Softmax / sigmoid / cross-entropy (numerically stable). Ch 17, Ch 18
- Precision/recall/F1/AUC from predictions. Ch 10
- Cosine similarity / top-k retrieval. Ch 4, Ch 18
- Train/test split & a CV loop. Ch 18
The recipe book is your cheat sheet — but practice writing them without it.
How to answer well (meta-skills)
- Think out loud. Interviewers grade your reasoning, not just the answer. Narrate trade-offs.
- Start simple, then iterate. Baseline first ("I'd start with logistic regression to establish a number"), then add complexity with justification.
- Say "it depends" — then say on what. Almost every real answer is conditional; naming the conditions is the signal.
- Admit unknowns gracefully. "I haven't used X, but it's like Y because…" beats bluffing. Reasoning from fundamentals is the whole point of this book.
- Tie back to business impact. "This raises NDCG, which should lift engagement, which we'd confirm with an A/B test."
A 4-week prep plan
- Week 1 — Fundamentals. Re-read Parts I–II and the concept bank; explain each aloud. Redo the from-scratch gradient descent and metrics.
- Week 2 — Algorithms & math. Chapters 20–24; implement k-means, k-NN, a decision- tree split; drill probability and the p-value/Bayes questions.
- Week 3 — Deep learning & modern. Chapters 11–15, 27; be able to whiteboard a training loop and explain attention, RAG, and LoRA.
- Week 4 — System design & mocks. Chapter 28; practice 5–6 design prompts out loud under time; do mock interviews; prepare behavioral stories (impact, conflict, failure).
The takeaway
Interviews reward structured fundamentals over memorized trivia. The differentiating rounds are ML concepts and system design — exactly what this book built. Rehearse the concept bank aloud, re-implement the core algorithms on a blank page, walk the system- design framework, think out loud, start simple, and tie everything to impact. You can now define and code every term an ML engineer will throw at you — and reason from first principles when you meet a new one.
That was the goal of this whole book. Go get the job — and to prove you can do it (to yourself and to them), the final part is five complete projects you build end to end: a GPT, a LoRA fine-tune, an agent, a CNN, and a diffusion model. Let's go build. 👉