Tensors, shapes & broadcasting
Every AI computation is a flow of tensors through operations. If you're fluent
in shapes and broadcasting, you can read any model's code and debug the error
message that eats 90% of beginners' time: shapes (a,b) and (c,d) not aligned.
What is a tensor?
A tensor is just an n-dimensional array of numbers. The number of dimensions
is its rank (NumPy calls it ndim). That's the whole definition — in deep
learning, "tensor" is simply the word for the multi-dimensional arrays that flow
through a model.
| Rank | Name | Example | Shape |
|---|---|---|---|
| 0 | scalar | 5.0 | () |
| 1 | vector | [1, 2, 3] | (3,) |
| 2 | matrix | a table / image channel | (rows, cols) |
| 3 | 3-tensor | an RGB image | (height, width, 3) |
| 4 | 4-tensor | a batch of images | (batch, H, W, 3) |
import numpy as np
s = np.array(5.0)
v = np.array([1., 2., 3.])
M = np.array([[1., 2.], [3., 4.]])
T = np.zeros((2, 3, 4))
print("ranks:", s.ndim, v.ndim, M.ndim, T.ndim)
print("shapes:", s.shape, v.shape, M.shape, T.shape)
Output:
ranks: 0 1 2 3
shapes: () (3,) (2, 2) (2, 3, 4)
Don't be confused: "tensor" (ML) vs "tensor" (physics/math). In physics a tensor is an object with strict transformation rules. In ML, "tensor" just means "n-dimensional array." When a PyTorch person says tensor, they mean the array. Don't overthink it.
Shape is the thing you reason about
Almost every bug is a shape bug. Two habits save you:
- Say the shape out loud at each line: "X is
(batch, features)." - Know what each axis means — by deep convention, axis 0 is the batch /
sample axis (one row per example).
X[i]is the i-th example.
Reshaping moves the same numbers into a new shape
a = np.arange(12) # [0 1 2 ... 11], shape (12,)
print(a.reshape(3, 4)) # same 12 numbers, now 3 rows of 4
print(a.reshape(3, 4).reshape(-1).shape) # -1 = "infer this axis" -> flat again
Output:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
(12,)
-1 means "you figure out this dimension so the total count matches." You'll see
x.reshape(batch, -1) constantly to flatten everything-but-the-batch.
Broadcasting: the rule that removes loops
Broadcasting lets NumPy combine arrays of different shapes by virtually stretching the smaller one. It's how you add a bias to every row, or scale every column, without a loop. The rule: compare shapes from the right; dimensions are compatible if they're equal or one of them is 1.
Add a scalar to everything
X = np.array([[1., 2., 3.],
[4., 5., 6.]])
print(X + 10) # the 10 is stretched to every element
Output:
[[11. 12. 13.]
[14. 15. 16.]]
Add a per-column vector (shape (3,)) to every row
print(X + np.array([100., 200., 300.])) # (2,3) + (3,) -> stretched down rows
Output:
[[101. 202. 303.]
[104. 205. 306.]]
Add a per-row vector — you must make it a column (2,1)
print(X + np.array([[10.], [20.]])) # (2,3) + (2,1) -> stretched across cols
Output:
[[11. 12. 13.]
[24. 25. 26.]]
That [:, None] / (2,1) trick — turning a vector into an explicit column — is
how you control which axis broadcasts. We used it in the recipe book to compute
all-pairs distances.
axis: the other thing everyone trips on
Reductions like mean, sum, max take an axis. The mental model:
axis=k is the axis that disappears.
X = np.array([[1., 2., 3.],
[4., 5., 6.]])
print("axis=0 (down columns):", X.mean(axis=0)) # collapses the 2 rows -> 3 numbers
print("axis=1 (across rows) :", X.mean(axis=1)) # collapses the 3 cols -> 2 numbers
Output:
axis=0 (down columns): [2.5 3.5 4.5]
axis=1 (across rows) : [2. 5.]
Don't be confused:
axis=0≠ "rows."axis=0is the row axis, so reducing over it collapses the rows and leaves one number per column. "Per-column statistic" (like feature means for standardization) =axis=0. "Per-row statistic" (like normalizing each sample) =axis=1. Read it as "the axis I sum over is the axis that vanishes."
Matrix multiplication: the shape contract
A @ B requires the inner dimensions to match: (m, k) @ (k, n) -> (m, n).
The shared k is summed away.
A = np.ones((2, 3))
B = np.ones((3, 5))
print((A @ B).shape) # (2,3) @ (3,5) -> (2,5)
Output:
(2, 5)
A neural network layer is this: outputs = inputs @ weights + bias, where
inputs is (batch, in_features) and weights is (in_features, out_features).
When you see the dreaded "shapes not aligned" error, line up the inner dimensions
and one of them is wrong.
Dtype: the silent gotcha
Tensors have a dtype (float32, float64, int64…). Deep learning runs in float32 by default (half the memory of float64, plenty of precision), and modern training uses float16/bfloat16 for speed. Integer arrays do integer division and can't hold gradients — a frequent surprise:
print(np.array([1, 2, 3]) / 2) # NumPy promotes to float -> fine
print((np.array([1, 2, 3]) * 1.0).dtype)
Output:
[0.5 1. 1.5]
float64
Don't be confused: float32 vs float64. NumPy defaults to float64; PyTorch defaults to float32. If you move data between them and get a dtype error, cast explicitly with
.astype(np.float32)or.float(). Models almost never need float64.
The takeaway
Tensors are n-D arrays; rank counts the dimensions; shape is what you reason about; axis 0 is the batch; broadcasting stretches size-1 dimensions so you never loop; matmul contracts the shared inner dimension. With this, you can read model code. Next: how raw data becomes the tensors in the first place — features. 👉