A 5-minute primer: Python, NumPy & the mental model

This page gives you just enough to read every example in the book. Skip it if you're already comfortable with NumPy arrays.

Reading the code boxes

Grey boxes contain Python; the box right after shows what it prints:

print("hello")
print(2 + 3)

Output:

hello
5

Variables, lists, functions

x = 10                  # x now refers to the integer 10
words = ["cat", "dog"]  # a list: an ordered collection

def add(a, b):          # define a reusable function
    return a + b        # "return" hands a result back

print(add(2, 3))

Output:

5

NumPy: the array is everything

NumPy is the library for fast number-crunching in Python; essentially all of AI's math runs on it (or on its GPU cousin, PyTorch). We nickname it np. Its one big idea is the array: a grid of numbers you operate on all at once, instead of looping.

import numpy as np

v = np.array([2.0, 0.5, 1.0])   # a 1-D array = a vector
print(v)
print("shape:", v.shape)        # how big it is, per dimension
print("v * 2:", v * 2)          # operations apply to every element

Output:

[2.  0.5 1. ]
shape: (3,)
v * 2: [4. 1. 2.]

That last line is the whole point: v * 2 multiplied every element without a for loop. This is called vectorization, and it's why NumPy is fast.

Vectors, matrices, and the word "shape"

  • A vector is a 1-D array — a single row of numbers, a point in space.
  • A matrix is a 2-D array — a grid with rows and columns.
  • .shape tells you the size along each dimension. (3,) is a length-3 vector; (2, 3) is a 2-row, 3-column matrix.
M = np.array([[1, 2, 3],
              [4, 5, 6]])
print("shape:", M.shape)   # (rows, columns)
print("M.T:\n", M.T)       # transpose: rows become columns

Output:

shape: (2, 3)
M.T:
 [[1 4]
 [2 5]
 [3 6]]

The two operations you'll see constantly

a = np.array([1.0, 2.0, 3.0])
b = np.array([4.0, 5.0, 6.0])
print("dot product:", np.dot(a, b))   # 1*4 + 2*5 + 3*6 = 32
print("elementwise :", a * b)         # [4, 10, 18] — NOT a dot product

Output:

dot product: 32.0
elementwise : [ 4. 10. 18.]

Don't be confused: * vs @/np.dot. a * b multiplies element by element and keeps the same shape. a @ b (matrix multiply) / np.dot(a, b) sums those products into a single number (for vectors). Mixing these up is the #1 NumPy bug. The dot product is the engine of nearly every similarity in this book.

The mental model of "learning"

Here is the entire field in four words: adjust numbers to reduce error.

A model is a box of adjustable numbers (called parameters or weights). You show it examples, measure how wrong it is (the loss), and nudge the numbers in the direction that makes the loss smaller. Repeat millions of times. That's it — linear regression and a 100-billion-parameter language model differ in scale and architecture, not in this core loop. We make each of those four words concrete in Chapter 1.

NumPy bits we use throughout

You'll seeMeaning
np.array([...])make a vector / matrix
a * belementwise multiply (same shape out)
a @ b, np.dot(a, b)dot / matrix product (contracts a dimension)
np.linalg.norm(v)length of a vector
X.shape, X.Tsize per dimension; transpose
X.mean(axis=0)average down each column
np.exp, np.log$e^x$ and natural log, elementwise
np.argsort(d)indices that would sort d

That's everything you need. Next: what a "model" actually is. 👉