Step 2 — The scatter (cost) matrix
Recall the per-segment cost from the math chapter: the scatter $v(i,j)$ of frames $i$ through $j$ is
$$ v(i,j) = \sum_{t=i}^{j} K_{tt} \;-\; \frac{1}{L} \sum_{s=i}^{j}\sum_{t=i}^{j} K_{st}, \qquad L = j - i + 1. $$
We want this number for every possible segment, and we want each one to be instant to look up. The trick that makes it instant is cumulative sums.
What is a cumulative sum?
A cumulative (running) sum replaces each number with the total of everything up to
it: [5, 2, 4] becomes [5, 7, 11]. Why bother? Because once you have running
totals, the sum of any stretch is a single subtraction. The sum of items 1–2
above is 11 − 5 = 6 — no re-adding. We use the same idea in two dimensions.
The code
import numpy as np
def calc_scatters(K):
"""
Return J, an n x n table where J[i, j] is the scatter v(i, j) of frames
i..j. Building it costs O(n^2); each entry is then an O(1) lookup.
"""
n = K.shape[0]
# running total of the diagonal: dcum[t] = sum of K[0,0] .. K[t-1,t-1]
dcum = np.concatenate(([0.0], np.cumsum(np.diag(K))))
# 2-D running total: Kcum[a, b] = sum of the top-left a x b block of K
Kcum = np.zeros((n + 1, n + 1))
Kcum[1:, 1:] = np.cumsum(np.cumsum(K, axis=0), axis=1)
J = np.zeros((n, n))
for i in range(n):
for j in range(i, n):
L = j - i + 1
diag_sum = dcum[j + 1] - dcum[i] # diagonal part
block = (Kcum[j + 1, j + 1] - Kcum[i, j + 1] # block sum via
- Kcum[j + 1, i] + Kcum[i, i]) # 4 corner lookups
J[i, j] = diag_sum - block / L
return J
Reading the code
np.diag(K)pulls out the diagonal (each frame's self-similarity).np.cumsum(...)makes its running total, sodcum[j+1] - dcum[i]instantly gives the diagonal sum for frames $i..j$.np.cumsum(np.cumsum(K, axis=0), axis=1)does the running total down then across, giving a 2-D running totalKcum. After that, the sum of any rectangular block ofKis four corner lookups added/subtracted (see below).- The two
forloops fill inJ[i, j]for every segment using those instant lookups.
Why the four corners work
For a 2-D running total, the sum inside a rectangle equals (big rectangle) − (strip above) − (strip left) + (overlap added back):
sum = Kcum[j+1, j+1] # everything from the origin down to (j, j)
- Kcum[i, j+1] # subtract the strip above the block
- Kcum[j+1, i ] # subtract the strip to the left
+ Kcum[i, i ] # the top-left corner got subtracted twice; add it back
This is the standard "inclusion–exclusion" pattern for rectangle sums.
Try it: input and output
Four frames that are single numbers — two near 0, two near 5 (so clearly two
groups). We use the linear kernel:
import numpy as np
X = np.array([[0.0], [0.2], [5.0], [5.1]])
K = X @ X.T # linear kernel
J = calc_scatters(K)
print(np.round(J, 3))
Output:
[[ 0. 0.02 16.027 24.527]
[ 0. 0. 11.52 15.687]
[ 0. 0. 0. 0.005]
[ 0. 0. 0. 0. ]]
How to read J[i, j] (only the upper triangle is filled; j ≥ i):
J[0,1] = 0.02— frames 0–1 (both near0) form a tight segment. Tiny scatter = good segment. (This is the exact number we computed by hand in Chapter 2! ✔)J[2,3] = 0.005— frames 2–3 (both near5) are also tight.J[0,3] = 24.527— lumping all four frames into one segment mixes the two groups, so the scatter is huge. Bad segment.
The table already tells us the right split: keep 0–1 together, keep 2–3
together, and don't merge across the gap. Dynamic programming (next chapter)
makes that decision automatically.
Speed
Both the running totals and filling J cost on the order of $n^2$ operations —
the same as just storing $K$. So we get every segment's cost cheaply.
Sanity check against the definition
We can confirm calc_scatters matches the original "distance to the average"
definition by brute force:
def brute_scatter(X, i, j):
seg = X[i:j + 1]
mu = seg.mean(axis=0) # the segment average
return float(((seg - mu) ** 2).sum())
X = np.random.randn(8, 3) # 8 random frames, 3 features
J = calc_scatters(X @ X.T)
ok = all(abs(J[i, j] - brute_scatter(X, i, j)) < 1e-8
for i in range(8) for j in range(i, 8))
print("scatter matrix matches the definition:", ok)
Output:
scatter matrix matches the definition: True
Now every segment has a cost. Time to find the best combination of segments. 👉