Step 3 — Dynamic programming
We can now score any single segment instantly. The remaining problem: out of the enormous number of ways to place the cuts, find the one combination with the smallest total scatter
$$ J = \sum_{r=0}^{m} v\big(t_r,\ t_{r+1}-1\big). $$
Trying every possibility is hopeless — for $n$ frames and $m$ cuts there are $\binom{n-1}{m}$ combinations, which explodes. Dynamic programming (DP) finds the guaranteed-best answer without trying them all.
The idea, in plain words
DP is "solve small pieces, reuse the answers." Here's the intuition:
Suppose you already knew the best way to split the first part of the video into a few segments. Then to split a slightly longer part, you only need to decide where the last segment starts — everything before it is a smaller problem you've already solved.
So we build up the answer for longer and longer prefixes of the video, reusing the best answers to the shorter prefixes. Because every sub-answer is optimal, the final answer is optimal too.
The recurrence
Let $C[k, \ell]$ be the smallest possible total scatter when splitting the first $\ell$ frames into exactly $k+1$ segments (that's $k$ cuts). The last segment must start somewhere — call it $t$ — and run to frame $\ell-1$. Everything before $t$ is the best $k$-segment split of the first $t$ frames. So we just try every possible start $t$ and keep the cheapest:
$$ C[k, \ell] \;=\; \min_{t} \Big( C[k-1,\ t] \;+\; v(t,\ \ell-1) \Big). $$
- Base case: $C[0, \ell] = v(0, \ell-1)$ — with zero cuts there's one segment covering everything.
- Answer: $C[m, n]$ is the best total scatter for the whole video with $m$ cuts.
To recover where the cuts are (not just the cost), we remember, for each cell, which start $t$ won. Walking those choices backward is called backtracking.
The code
import numpy as np
def cpd_nonlin(K, ncp, lmin=1, lmax=None):
"""
Globally optimal segmentation given the kernel matrix K.
K : (n, n) kernel matrix.
ncp : number of change points m (=> m + 1 segments).
lmin : smallest allowed segment length.
lmax : largest allowed segment length (defaults to n).
Returns (cps, cost):
cps : the m change-point positions (start frame of each new segment).
cost : the total scatter of that optimal segmentation.
"""
n = K.shape[0]
m = int(ncp)
if lmax is None:
lmax = n
if (m + 1) * lmin > n:
raise ValueError("sequence too short for this many change points")
J = calc_scatters(K) # from Step 2: every segment's cost
INF = 1e18
C = np.full((m + 1, n + 1), INF) # C[k, l]: best cost, first l frames, k+1 segs
back = np.zeros((m + 1, n + 1), dtype=int) # remembers the winning start t
# base row: a single segment [0 .. l-1]
for l in range(lmin, min(lmax, n) + 1):
C[0, l] = J[0, l - 1]
# fill rows k = 1 .. m
for k in range(1, m + 1):
for l in range((k + 1) * lmin, n + 1):
t_lo = max(k * lmin, l - lmax)
t_hi = l - lmin # last segment must be at least lmin long
if t_lo > t_hi:
continue
for t in range(t_lo, t_hi + 1):
cost = C[k - 1, t] + J[t, l - 1]
if cost < C[k, l]:
C[k, l] = cost
back[k, l] = t
# backtrack from the full problem to read off the cuts
cps = np.zeros(m, dtype=int)
l = n
for k in range(m, 0, -1):
t = back[k, l]
cps[k - 1] = t
l = t
return cps, C[m, n]
Reading the code
Cis the table of best costs;C[k, l]is "firstlframes,k+1segments."np.full((m+1, n+1), INF)fills the table with a huge number (INF) meaning "not computed / impossible yet"; real costs will be smaller and overwrite it.- The three nested loops implement the recurrence: for each number of cuts
kand each prefix lengthl, try every starttfor the last segment and keep the cheapest, recording the winner inback. lmin/lmax(min/max segment length) shrink the search and prevent silly one-frame segments.- The final loop reads
backfrom the whole-video cell backward to list the cuts.
Try it: input and output
Six frames: the first three point "right", the last three point "up" — two obvious shots, so the boundary should be at frame 3.
import numpy as np
from kts import build_kernel, cpd_nonlin
X = np.array([[1.0, 0.0], [0.95, 0.10], [0.90, 0.05], # shot A (frames 0,1,2)
[0.10, 0.90], [0.05, 0.95], [0.00, 1.00]]) # shot B (frames 3,4,5)
K = build_kernel(X, "cosine")
cps, cost = cpd_nonlin(K, ncp=1, lmin=1) # ask for exactly 1 cut
print("1 cut :", cps.tolist(), " cost:", round(cost, 4))
cps2, cost2 = cpd_nonlin(K, ncp=2, lmin=1) # ask for exactly 2 cuts
print("2 cuts:", cps2.tolist(), " cost:", round(cost2, 4))
Output:
1 cut : [3] cost: 0.0116
2 cuts: [3, 4] cost: 0.0069
- With 1 cut, KTS puts it at frame
3— exactly the true boundary between the two shots. 🎯 - With 2 cuts, it keeps the real boundary at
3and adds a second cut at4. Notice the cost barely drops (0.0116 → 0.0069): the first cut explains almost all the structure; the second is just splitting hairs. That tiny extra gain is the clue we'll use in the next chapter to decide the right number of cuts automatically.
How backtracking recovers the cuts
back[k, l] stored the best start of the last segment for each sub-problem.
Starting from the whole video (m, n), we read that start, jump to the smaller
sub-problem, and repeat — collecting one cut each step. Those starts are the
change points.
Speed
The loops run on the order of $m \times n \times n$ steps in the worst case —
polynomial and fast, versus the impossible "try everything" count. (The full
implementation in Chapter 7 replaces the inner
for t loop with a single NumPy operation for extra speed; the logic is
identical.)
This solves the problem when you already know how many cuts you want. Usually you don't — so next we let KTS choose. 👉