A worked demo

Let's confirm KTS actually works. We build a synthetic "video" whose true structure we know, run KTS, and check it recovers the planted boundaries.

The full script is code/demo.py; here's how it's built.

A fake video with known change points

To test KTS we need data where we already know the right answer. So we manufacture a fake video: we pick a few random "centers" (each stands for a shot), and for each shot we generate a block of frames that are that center plus a little random noise. Frames in the same block look alike; frames in different blocks look different. The boundaries between blocks are the true change points we hope KTS will rediscover.

def make_synthetic_sequence(seed=0):
    """Return (X, true_change_points)."""
    rng = np.random.default_rng(seed)
    d = 8
    block_lengths = [40, 25, 35, 20, 30]   # 5 segments
    centers = rng.normal(scale=3.0, size=(len(block_lengths), d))

    chunks, true_cps, cursor = [], [], 0
    for i, L in enumerate(block_lengths):
        chunks.append(centers[i] + rng.normal(scale=0.4, size=(L, d)))
        cursor += L
        if i < len(block_lengths) - 1:
            true_cps.append(cursor)        # boundary = start of next block
    X = np.vstack(chunks)
    return X, np.array(true_cps)

A couple of unfamiliar bits:

np.random.default_rng(seed) creates a reproducible source of random numbers — using the same seed gives the same "random" data every run, so the results in this book are repeatable.
rng.normal(scale=0.4, size=(L, d)) draws L × d small random numbers (the per-frame noise). A bigger scale = noisier frames = harder problem.
np.vstack([...]) stacks the blocks on top of each other into one big (n_frames, d) array.

We use 5 blocks of lengths [40, 25, 35, 20, 30], so the true change points are at frames [40, 65, 100, 120] (4 cuts → 5 segments).

Running KTS

from kts import build_kernel, cpd_auto, cpd_nonlin

K = build_kernel(X, kind="cosine")

# automatic number of segments
cps_auto, scores = cpd_auto(K, ncp_max=12, vmax=1.0, lmin=5)

# or, if we already knew there were 5 segments:
cps_fixed, cost = cpd_nonlin(K, ncp=4, lmin=5)

What you get

Running python demo.py produces:

sequence length n = 150
true change points = [40, 65, 100, 120]  (5 segments)

[cpd_auto]   change points = [40, 65, 100, 120]  (5 segments)
[cpd_nonlin] change points = [40, 65, 100, 120]  (fixed m=4, total scatter=4.04)

fixed-m boundary errors (frames): [0, 0, 0, 0]  (mean 0.00)

🎯 Both modes recover all four true change points exactly — and cpd_auto independently figures out that there are 5 segments, without being told.

If Matplotlib is installed, the demo also saves kts_demo.png showing the feature heatmap with true boundaries (white dashed) and detected boundaries (red) overlaid.

Things to try next

Make it harder. Increase the per-cluster noise (scale=0.4) until segmentation starts to fail — that's the signal-to-noise limit.
Change the kernel. Swap "cosine" for "rbf" and watch how the bandwidth $\sigma$ changes sensitivity.
Tune vmax. Lower it toward 0 to over-segment; raise it to merge.
Plot the scores. scores from cpd_auto vs $m$ reveals the elbow that the penalty is exploiting.
Real features. Replace X with CNN features extracted one-per-second from a real video, then map change points back to timestamps.

That's KTS, from the math all the way to a working segmenter. 🎉

Finally, let's step back and look at the research paper this all came from — how to read it, how to reproduce a paper like it, and where KTS is used. 👉

Kernel Temporal Segmentation from Scratch

A worked demo

A fake video with known change points

Running KTS

What you get

Things to try next