Introduction
No background required. This book assumes you know nothing about programming, AI, or math beyond high-school basics. Every concept, line of code, and symbol is explained, and every code example is followed by the exact output it produces. If you've never written Python, read the 5-minute primer first.
The one-sentence idea
Kernel Temporal Segmentation (KTS) automatically chops a video into a small number of consistent pieces ("shots" or "scenes") by finding the moments where the picture changes a lot.
An everyday analogy
Imagine flipping through a photo album. Pages 1–10 are a beach trip, pages 11–18 are a birthday party, pages 19–25 are a hike. You can instantly spot the boundaries — the points where the photos suddenly start looking different. KTS does exactly this, automatically, for the frames of a video. The boundaries it finds are called change points.
Where it's used
KTS was introduced by Potapov, Douze, Harchaoui & Schmid in "Category-Specific Video Summarization" (ECCV 2014). It became a standard first step in video summarization: before a computer decides which parts of a video are worth keeping, it first splits the video into shots — and KTS is the tool that does the splitting. It's used to prepare well-known datasets like TVSum and SumMe.
How a video becomes numbers
A computer can't "see" a picture the way we do — it works with numbers. So each video frame is turned into a list of numbers called a feature vector: a numeric fingerprint of that frame. Similar-looking frames get similar fingerprints.
So our input is a sequence of feature vectors, one per frame:
$$ x_1, x_2, \dots, x_n \in \mathbb{R}^d $$
Read that line as: "there are $n$ frames; each frame $x_t$ is a list of $d$ numbers." (The symbol $\mathbb{R}^d$ just means "a list of $d$ real numbers.") Frames within the same shot have similar fingerprints; at a shot boundary the fingerprint changes sharply. KTS finds those sharp changes.
What KTS actually does
Given those fingerprints, KTS finds the change points so that:
- frames inside a segment are as similar to each other as possible, and
- the number of segments stays small (we don't want one segment per frame).
It does this with two ideas, which are the two halves of this book:
- Measure how "spread out" a segment is using a kernel — a flexible similarity measure (Chapters 1–2).
- Find the cut points that minimize the total spread using dynamic programming, a technique that is guaranteed to find the best possible answer, not just a good guess (Chapters 3–6).
What you'll build
By the end you'll have one short, fully-understood Python file,
kts.py, and you'll watch it recover hidden
boundaries in test data exactly.
What you need
- Python 3.8+ (the primer shows how code/output is presented)
- NumPy (a number-crunching library — also covered in the primer)
- (optional) Matplotlib, only for one picture in the final demo
Let's start with the two background ideas: kernels and change points. 👉