Step 3 — Dynamic programming

We can now score any single segment instantly. The remaining problem: out of the enormous number of ways to place the cuts, find the one combination with the smallest total scatter

$$ J = \sum_{r=0}^{m} v\big(t_r,\ t_{r+1}-1\big). $$

Trying every possibility is hopeless — for $n$ frames and $m$ cuts there are $\binom{n-1}{m}$ combinations, which explodes. Dynamic programming (DP) finds the guaranteed-best answer without trying them all.

The idea, in plain words

DP is "solve small pieces, reuse the answers." Here's the intuition:

Suppose you already knew the best way to split the first part of the video into a few segments. Then to split a slightly longer part, you only need to decide where the last segment starts — everything before it is a smaller problem you've already solved.

So we build up the answer for longer and longer prefixes of the video, reusing the best answers to the shorter prefixes. Because every sub-answer is optimal, the final answer is optimal too.

The recurrence

Let $C[k, \ell]$ be the smallest possible total scatter when splitting the first $\ell$ frames into exactly $k+1$ segments (that's $k$ cuts). The last segment must start somewhere — call it $t$ — and run to frame $\ell-1$. Everything before $t$ is the best $k$-segment split of the first $t$ frames. So we just try every possible start $t$ and keep the cheapest:

$$ C[k, \ell] \;=\; \min_{t} \Big( C[k-1,\ t] \;+\; v(t,\ \ell-1) \Big). $$

Base case: $C[0, \ell] = v(0, \ell-1)$ — with zero cuts there's one segment covering everything.
Answer: $C[m, n]$ is the best total scatter for the whole video with $m$ cuts.

To recover where the cuts are (not just the cost), we remember, for each cell, which start $t$ won. Walking those choices backward is called backtracking.

The code

import numpy as np


def cpd_nonlin(K, ncp, lmin=1, lmax=None):
    """
    Globally optimal segmentation given the kernel matrix K.

    K    : (n, n) kernel matrix.
    ncp  : number of change points m  (=> m + 1 segments).
    lmin : smallest allowed segment length.
    lmax : largest allowed segment length (defaults to n).

    Returns (cps, cost):
      cps  : the m change-point positions (start frame of each new segment).
      cost : the total scatter of that optimal segmentation.
    """
    n = K.shape[0]
    m = int(ncp)
    if lmax is None:
        lmax = n
    if (m + 1) * lmin > n:
        raise ValueError("sequence too short for this many change points")

    J = calc_scatters(K)                 # from Step 2: every segment's cost

    INF = 1e18
    C = np.full((m + 1, n + 1), INF)     # C[k, l]: best cost, first l frames, k+1 segs
    back = np.zeros((m + 1, n + 1), dtype=int)   # remembers the winning start t

    # base row: a single segment [0 .. l-1]
    for l in range(lmin, min(lmax, n) + 1):
        C[0, l] = J[0, l - 1]

    # fill rows k = 1 .. m
    for k in range(1, m + 1):
        for l in range((k + 1) * lmin, n + 1):
            t_lo = max(k * lmin, l - lmax)
            t_hi = l - lmin              # last segment must be at least lmin long
            if t_lo > t_hi:
                continue
            for t in range(t_lo, t_hi + 1):
                cost = C[k - 1, t] + J[t, l - 1]
                if cost < C[k, l]:
                    C[k, l] = cost
                    back[k, l] = t

    # backtrack from the full problem to read off the cuts
    cps = np.zeros(m, dtype=int)
    l = n
    for k in range(m, 0, -1):
        t = back[k, l]
        cps[k - 1] = t
        l = t
    return cps, C[m, n]

Reading the code

C is the table of best costs; C[k, l] is "first l frames, k+1 segments."
np.full((m+1, n+1), INF) fills the table with a huge number (INF) meaning "not computed / impossible yet"; real costs will be smaller and overwrite it.
The three nested loops implement the recurrence: for each number of cuts k and each prefix length l, try every start t for the last segment and keep the cheapest, recording the winner in back.
lmin/lmax (min/max segment length) shrink the search and prevent silly one-frame segments.
The final loop reads back from the whole-video cell backward to list the cuts.

Try it: input and output

Six frames: the first three point "right", the last three point "up" — two obvious shots, so the boundary should be at frame 3.

import numpy as np
from kts import build_kernel, cpd_nonlin

X = np.array([[1.0, 0.0], [0.95, 0.10], [0.90, 0.05],   # shot A (frames 0,1,2)
              [0.10, 0.90], [0.05, 0.95], [0.00, 1.00]]) # shot B (frames 3,4,5)
K = build_kernel(X, "cosine")

cps, cost = cpd_nonlin(K, ncp=1, lmin=1)    # ask for exactly 1 cut
print("1 cut :", cps.tolist(), " cost:", round(cost, 4))

cps2, cost2 = cpd_nonlin(K, ncp=2, lmin=1)  # ask for exactly 2 cuts
print("2 cuts:", cps2.tolist(), " cost:", round(cost2, 4))

Output:

1 cut : [3]  cost: 0.0116
2 cuts: [3, 4]  cost: 0.0069

With 1 cut, KTS puts it at frame 3 — exactly the true boundary between the two shots. 🎯
With 2 cuts, it keeps the real boundary at 3 and adds a second cut at 4. Notice the cost barely drops (0.0116 → 0.0069): the first cut explains almost all the structure; the second is just splitting hairs. That tiny extra gain is the clue we'll use in the next chapter to decide the right number of cuts automatically.

How backtracking recovers the cuts

back[k, l] stored the best start of the last segment for each sub-problem. Starting from the whole video (m, n), we read that start, jump to the smaller sub-problem, and repeat — collecting one cut each step. Those starts are the change points.

Speed

The loops run on the order of $m \times n \times n$ steps in the worst case — polynomial and fast, versus the impossible "try everything" count. (The full implementation in Chapter 7 replaces the inner for t loop with a single NumPy operation for extra speed; the logic is identical.)

This solves the problem when you already know how many cuts you want. Usually you don't — so next we let KTS choose. 👉

Kernel Temporal Segmentation from Scratch