Step 2 — The scatter (cost) matrix

Recall the per-segment cost from the math chapter: the scatter $v(i,j)$ of frames $i$ through $j$ is

$$ v(i,j) = \sum_{t=i}^{j} K_{tt} \;-\; \frac{1}{L} \sum_{s=i}^{j}\sum_{t=i}^{j} K_{st}, \qquad L = j - i + 1. $$

We want this number for every possible segment, and we want each one to be instant to look up. The trick that makes it instant is cumulative sums.

What is a cumulative sum?

A cumulative (running) sum replaces each number with the total of everything up to it: [5, 2, 4] becomes [5, 7, 11]. Why bother? Because once you have running totals, the sum of any stretch is a single subtraction. The sum of items 1–2 above is 11 − 5 = 6 — no re-adding. We use the same idea in two dimensions.

The code

import numpy as np


def calc_scatters(K):
    """
    Return J, an n x n table where J[i, j] is the scatter v(i, j) of frames
    i..j. Building it costs O(n^2); each entry is then an O(1) lookup.
    """
    n = K.shape[0]

    # running total of the diagonal: dcum[t] = sum of K[0,0] .. K[t-1,t-1]
    dcum = np.concatenate(([0.0], np.cumsum(np.diag(K))))

    # 2-D running total: Kcum[a, b] = sum of the top-left a x b block of K
    Kcum = np.zeros((n + 1, n + 1))
    Kcum[1:, 1:] = np.cumsum(np.cumsum(K, axis=0), axis=1)

    J = np.zeros((n, n))
    for i in range(n):
        for j in range(i, n):
            L = j - i + 1
            diag_sum = dcum[j + 1] - dcum[i]                 # diagonal part
            block = (Kcum[j + 1, j + 1] - Kcum[i, j + 1]     # block sum via
                     - Kcum[j + 1, i] + Kcum[i, i])          # 4 corner lookups
            J[i, j] = diag_sum - block / L
    return J

Reading the code

np.diag(K) pulls out the diagonal (each frame's self-similarity). np.cumsum(...) makes its running total, so dcum[j+1] - dcum[i] instantly gives the diagonal sum for frames $i..j$.
np.cumsum(np.cumsum(K, axis=0), axis=1) does the running total down then across, giving a 2-D running total Kcum. After that, the sum of any rectangular block of K is four corner lookups added/subtracted (see below).
The two for loops fill in J[i, j] for every segment using those instant lookups.

Why the four corners work

For a 2-D running total, the sum inside a rectangle equals (big rectangle) − (strip above) − (strip left) + (overlap added back):

sum = Kcum[j+1, j+1]   # everything from the origin down to (j, j)
    - Kcum[i,   j+1]   # subtract the strip above the block
    - Kcum[j+1, i  ]   # subtract the strip to the left
    + Kcum[i,   i  ]   # the top-left corner got subtracted twice; add it back

This is the standard "inclusion–exclusion" pattern for rectangle sums.

Try it: input and output

Four frames that are single numbers — two near 0, two near 5 (so clearly two groups). We use the linear kernel:

import numpy as np

X = np.array([[0.0], [0.2], [5.0], [5.1]])
K = X @ X.T               # linear kernel
J = calc_scatters(K)
print(np.round(J, 3))

Output:

[[ 0.     0.02  16.027 24.527]
 [ 0.     0.    11.52  15.687]
 [ 0.     0.     0.     0.005]
 [ 0.     0.     0.     0.   ]]

How to read J[i, j] (only the upper triangle is filled; j ≥ i):

J[0,1] = 0.02 — frames 0–1 (both near 0) form a tight segment. Tiny scatter = good segment. (This is the exact number we computed by hand in Chapter 2! ✔)
J[2,3] = 0.005 — frames 2–3 (both near 5) are also tight.
J[0,3] = 24.527 — lumping all four frames into one segment mixes the two groups, so the scatter is huge. Bad segment.

The table already tells us the right split: keep 0–1 together, keep 2–3 together, and don't merge across the gap. Dynamic programming (next chapter) makes that decision automatically.

Speed

Both the running totals and filling J cost on the order of $n^2$ operations — the same as just storing $K$. So we get every segment's cost cheaply.

Sanity check against the definition

We can confirm calc_scatters matches the original "distance to the average" definition by brute force:

def brute_scatter(X, i, j):
    seg = X[i:j + 1]
    mu = seg.mean(axis=0)              # the segment average
    return float(((seg - mu) ** 2).sum())

X = np.random.randn(8, 3)             # 8 random frames, 3 features
J = calc_scatters(X @ X.T)
ok = all(abs(J[i, j] - brute_scatter(X, i, j)) < 1e-8
         for i in range(8) for j in range(i, 8))
print("scatter matrix matches the definition:", ok)

Output:

scatter matrix matches the definition: True

Now every segment has a cost. Time to find the best combination of segments. 👉

Kernel Temporal Segmentation from Scratch