Step 4 — Choosing the number of segments

cpd_nonlin makes you specify $m$, the number of cuts. But in real life you don't know how many shots a video has. This chapter makes KTS decide for you.

The trade-off

More cuts always lower the total scatter. (In the extreme, one frame per segment gives a scatter of zero — perfectly "tight" but completely useless.) So we can't just minimize scatter; that would always choose the maximum number of cuts.

We need to charge a fee for every cut, then pick the number of cuts that minimizes scatter + fees:

$$ m^\star = \arg\min_m \Big( J_m + \text{penalty}(m) \Big). $$

($\arg\min$ means "the $m$ that gives the smallest value".) This is the same spirit as not over-fitting in statistics: a new cut is only worth it if it lowers the scatter by more than its fee.

The penalty (and a bug to avoid)

A natural-but-wrong idea is to make the fee proportional to the scatter. That fails: as scatter shrinks toward zero, the fee shrinks too, so cuts become free and KTS never stops splitting. (We actually hit this and fixed it.)

The correct penalty grows with the number of cuts, independent of the scatter. For $m$ cuts over $n$ frames:

$$ \text{penalty}(m) = \frac{m}{2n},\Big( \log\!\big(\tfrac{n}{m}\big) + 1 \Big), \qquad \text{score}(m) = \frac{J_m}{n} + v_{\max}\cdot\text{penalty}(m). $$

The fee rises steadily with each extra cut, so the score goes down while cuts are explaining real structure, then up once they're just splitting hairs. The lowest point is the answer. The weight $v_{\max}$ is your one knob: larger → fewer segments.

The code

import numpy as np


def cpd_auto(K, ncp_max, vmax=1.0, lmin=1, lmax=None):
    """
    Automatically choose the number of change points.

    K       : (n, n) kernel matrix.
    ncp_max : largest number of cuts to consider.
    vmax    : penalty weight. Larger -> fewer segments. (try 0.5 .. 2.0)
    lmin    : minimum segment length.

    Returns (cps, scores):
      cps    : change points of the chosen segmentation.
      scores : dict {m: score} so you can see/plot the trade-off.
    """
    n = K.shape[0]
    ncp_max = min(ncp_max, n - 1)
    J = calc_scatters(K)                     # compute once, reuse for every m

    best_score = np.inf
    best_cps = np.array([], dtype=int)
    scores = {}

    for m in range(0, ncp_max + 1):
        if (m + 1) * lmin > n:
            break
        cps, J_m = cpd_nonlin(K, m, lmin=lmin, lmax=lmax, J=J)
        penalty = 0.0 if m == 0 else (m / (2.0 * n)) * (np.log(n / m) + 1.0)
        score = J_m / n + vmax * penalty
        scores[m] = score
        if score < best_score:
            best_score = score
            best_cps = cps
    return best_cps, scores

Reading the code

  • It runs the Chapter-5 solver once for each candidate number of cuts m, from 0 up to ncp_max.
  • For each, it adds the penalty fee and keeps the m with the lowest total score.
  • best_cps ends up holding the cuts of the winning segmentation.

Try it: input and output

A small but realistic test: 30 frames, the first 15 around one center and the next 15 around another (with random noise). The true boundary is at frame 15.

import numpy as np
from kts import build_kernel, cpd_auto

rng = np.random.default_rng(0)
A = np.array([3, 3, 0, 0, 0, 0, 0, 0.0])
B = np.array([0, 0, 0, 0, 3, 3, 0, 0.0])
X = np.vstack([A + rng.normal(scale=0.5, size=(15, 8)),    # frames 0..14
               B + rng.normal(scale=0.5, size=(15, 8))])   # frames 15..29
K = build_kernel(X, "cosine")

cps, scores = cpd_auto(K, ncp_max=8, vmax=1.0, lmin=2)
print("chosen change points:", cps.tolist(), "=>", len(cps) + 1, "segments")
for m, s in scores.items():
    print(f"   m={m}: score={s:.3f}")

Output:

chosen change points: [15] => 2 segments
   m=0: score=0.538
   m=1: score=0.159
   m=2: score=0.202
   m=3: score=0.237
   m=4: score=0.269
   m=5: score=0.298
   m=6: score=0.323
   m=7: score=0.345
   m=8: score=0.366

KTS picks one cut at frame 15 — exactly the real boundary — without being told there were two groups. 🎯 Look at the scores: they plunge from m=0 (0.538) to m=1 (0.159), then climb steadily. That "valley" at m=1 is the penalty doing its job — the first cut pays for itself; further cuts don't. Plotting score vs m shows the same valley as a classic elbow.

Tuning, in practice

  • vmax is the main dial. Too many tiny segments → raise it. Too few → lower it. (1.0 is a sensible start.)
  • ncp_max caps how many cuts to even consider — set it comfortably above what you expect.
  • lmin (minimum segment length) is a cheap way to forbid spurious one- or two-frame segments — e.g. "at least 1 second of frames".

We now have all four pieces. Next: the polished, all-in-one implementation. 👉