The math behind KTS
This is the only heavily mathematical chapter. We'll go slowly and explain every symbol. The payoff is one simple formula that scores how "tight" a segment is, using only the kernel matrix from the previous chapter.
Reading the notation
A few symbols appear a lot:
- $\sum_{t=i}^{j}$ means "add up, for every $t$ from $i$ to $j$." For example $\sum_{t=1}^{3} t = 1 + 2 + 3 = 6$. The big $\Sigma$ is just shorthand for "sum".
- $\phi_t$ (the Greek letter phi) is frame $t$'s fingerprint after the kernel's similarity transformation. You never compute it directly — it's a thinking tool.
- $\mu$ (mu) is the average (mean) of a group of fingerprints.
- $\lVert a \rVert^2$ is the squared length of vector $a$ — a measure of "how big" it is. $\lVert a - b \rVert^2$ is then the squared distance between $a$ and $b$: small when they're close, large when far apart.
- $\langle a, b \rangle$ is the dot product of $a$ and $b$ (the similarity measure from before). Crucially, $\langle \phi_s, \phi_t \rangle = K_{st}$ — the dot product of two fingerprints is exactly the kernel-matrix entry.
Step 1: what "spread out" means
Take one segment covering frames $i, i+1, \dots, j$. Call its length $L = j - i + 1$ (the number of frames in it). Its average fingerprint is
$$ \mu = \frac{1}{L} \sum_{t=i}^{j} \phi_t. $$
(Add up the fingerprints in the segment, divide by how many there are — an ordinary average.)
The scatter of the segment is the total squared distance from each frame to that average:
$$ v(i, j) = \sum_{t=i}^{j} \lVert \phi_t - \mu \rVert^2. $$
In words: how far is each frame from the segment's "typical" frame, added up. A consistent segment → every frame near the average → small $v$. A jumbled segment → frames far from the average → large $v$. This single number is what KTS uses to grade a segment.
Step 2: rewrite it using only the kernel matrix
The problem: $\mu$ and $\phi_t$ live in an abstract space we can't compute in directly. The fix: expand the formula until only dot products $\langle \phi_s, \phi_t \rangle = K_{st}$ remain.
Expanding the squared distance (using $\lVert a-b\rVert^2 = \langle a,a\rangle - 2\langle a,b\rangle + \langle b,b\rangle$):
$$ v(i,j) = \sum_{t=i}^{j} \Big( \langle \phi_t, \phi_t \rangle - 2\langle \phi_t, \mu \rangle + \langle \mu, \mu \rangle \Big). $$
Now substitute $\mu = \frac{1}{L}\sum_s \phi_s$ and replace every $\langle \phi_s, \phi_t \rangle$ with $K_{st}$. The three pieces become:
- $\displaystyle \sum_{t=i}^{j} \langle \phi_t, \phi_t \rangle = \sum_{t=i}^{j} K_{tt}$ — the diagonal entries summed.
- $\displaystyle \sum_{t=i}^{j} 2\langle \phi_t, \mu \rangle = \frac{2}{L}\sum_{t=i}^{j}\sum_{s=i}^{j} K_{st}$ — twice the sum of the whole block, divided by $L$.
- $\displaystyle \sum_{t=i}^{j} \langle \mu, \mu \rangle = \frac{1}{L}\sum_{s,t} K_{st}$ — the block sum divided by $L$.
The last two combine, leaving the key formula:
$$ \boxed{\,v(i,j) = \sum_{t=i}^{j} K_{tt} \;-\; \frac{1}{L} \sum_{s=i}^{j}\sum_{t=i}^{j} K_{st}\,} $$
In plain words:
scatter = (sum of the diagonal entries for the segment) − (sum of the whole square block of $K$ for the segment, divided by its length).
No averages, no abstract fingerprints — just sums of numbers we already have in the grid $K$. That's the whole trick.
A worked example you can check by hand
Take two frames that are each a single number: $x_1 = 0$ and $x_2 = 0.2$, with the linear kernel (so this is just ordinary distance). The segment containing both has length $L = 2$ and average $\mu = (0 + 0.2)/2 = 0.1$.
Directly from the definition:
$$ v = (0 - 0.1)^2 + (0.2 - 0.1)^2 = 0.01 + 0.01 = 0.02. $$
In Chapter 4 we compute this segment's scatter with code and get the matching
value 0.02. Two completely different routes — by hand and by computer — agree.
The total objective
A full segmentation with change points $t_1 < \dots < t_m$ (giving $m+1$ segments) is scored by adding up every segment's scatter:
$$ J = \sum_{r=0}^{m} v\big(t_r,\ t_{r+1}-1\big). $$
KTS chooses the cut positions that make $J$ as small as possible. Two cases:
- You know how many segments you want → minimize $J$ directly with dynamic programming (Chapter 5).
- You don't → add a penalty for extra cuts and let the computer pick (Chapter 6).
Why we'll precompute sums
Each $v(i,j)$ needs a sum over a block of $K$. Done naively for every possible segment, that's far too slow. The next chapter shows a trick (cumulative sums) that makes each scatter an instant lookup. First, let's write the code to build $K$. 👉