Step 1 — The kernel matrix
Everything in KTS reads from the kernel matrix $K$: a grid where the entry in row $i$, column $j$ is the similarity between frame $i$ and frame $j$. Step one is to turn our table of frame fingerprints into this similarity grid.
Our input is an array X with one row per frame and one column per feature
number. So X has shape (n, d): n frames, each described by d numbers.
The output K has shape (n, n): a similarity for every pair of frames.
The code
import numpy as np
def linear_kernel(X):
"""K[i, j] = dot product of frame i and frame j. Plain variance."""
return X @ X.T
def cosine_kernel(X, eps=1e-8):
"""Similarity by direction only (length-normalized). A good default."""
norms = np.linalg.norm(X, axis=1, keepdims=True) # length of each frame vector
Xn = X / np.maximum(norms, eps) # rescale each to length 1
return Xn @ Xn.T
def rbf_kernel(X, sigma=None):
"""Gaussian similarity: 1 when identical, fading to 0 as frames differ."""
sq = np.sum(X**2, axis=1) # squared length of each frame
# squared distance between every pair: ||a-b||^2 = ||a||^2 + ||b||^2 - 2 a.b
d2 = sq[:, None] + sq[None, :] - 2.0 * (X @ X.T)
d2 = np.maximum(d2, 0.0) # clip tiny negatives (rounding)
if sigma is None:
med = np.median(d2[d2 > 0]) if np.any(d2 > 0) else 1.0
sigma = np.sqrt(0.5 * med) if med > 0 else 1.0 # auto bandwidth
return np.exp(-d2 / (2.0 * sigma**2))
def build_kernel(X, kind="cosine", **kwargs):
"""Pick a kernel by name and build the matrix."""
X = np.asarray(X, dtype=np.float64)
kernels = {"linear": linear_kernel, "cosine": cosine_kernel, "rbf": rbf_kernel}
if kind not in kernels:
raise ValueError(f"unknown kernel {kind!r}; pick one of {list(kernels)}")
return kernels[kind](X, **kwargs) if kind == "rbf" else kernels[kind](X)
Reading the code, line by line
X @ X.T—@is matrix multiplication and.Tis the transpose (rows ↔ columns). MultiplyingXby its own transpose computes the dot product of every pair of rows at once. That grid of dot products is the linear kernel.np.linalg.norm(X, axis=1, keepdims=True)— the length of each frame vector.axis=1means "do it per row" (one length per frame).X / norms— divide each frame by its own length, so every frame becomes length 1. After this, the dot product measures direction only — that's the cosine kernel.np.maximum(norms, eps)just avoids dividing by zero.- In
rbf_kernel,sq[:, None] + sq[None, :] - 2 * (X @ X.T)is the algebra identity for squared distance between every pair, computed in one shot. np.exp(...)raises $e$ to a power elementwise; here it turns "distance" into "similarity" (distance 0 → similarity 1; large distance → similarity ~0).
Try it: input and output
import numpy as np
X = np.array([[1.0, 0.0], # frame 0: points right
[0.9, 0.1], # frame 1: nearly the same as frame 0
[0.1, 1.0]]) # frame 2: points up (different)
K = build_kernel(X, kind="cosine")
print(np.round(K, 3)) # round to 3 decimals for readability
Output:
[[1. 0.994 0.1 ]
[0.994 1. 0.209]
[0.1 0.209 1. ]]
How to read this 3×3 grid:
- Diagonal is all
1.0— each frame is identical to itself. K[0,1] = 0.994— frames 0 and 1 are nearly identical (same shot).K[0,2] = 0.1— frames 0 and 2 are very different (a boundary sits between them).
The same three frames with the other two kernels:
print("linear:\n", np.round(build_kernel(X, "linear"), 3))
print("rbf:\n", np.round(build_kernel(X, "rbf"), 3))
Output:
linear:
[[1. 0.9 0.1 ]
[0.9 0.82 0.19]
[0.1 0.19 1.01]]
rbf:
[[1. 0.986 0.287]
[0.986 1. 0.368]
[0.287 0.368 1. ]]
All three agree on the story — frames 0 and 1 are similar, frame 2 is the odd one out — they just put the numbers on different scales.
Which kernel should I use?
cosine— the safe default for AI "deep" features; it ignores brightness/ scale and compares content.linear— reproduces ordinary variance; simplest.rbf— adds non-linear sensitivity, at the cost of asigmaknob (how fast similarity fades with distance). The code picks a reasonablesigmaautomatically.
Two practical notes
- Memory.
Khasn × nentries. A 1-hour video has too many frames, so in practice you keep one frame per second, run KTS, then map the change points back to real time. - Sanity check. A valid kernel matrix is symmetric (
K[i,j] == K[j,i]), and for the cosine kernel the diagonal is exactly1:
Xr = np.random.randn(6, 4) # 6 random frames, 4 features each
Kc = build_kernel(Xr, "cosine")
print("symmetric:", np.allclose(Kc, Kc.T))
print("diagonal all 1:", np.allclose(np.diag(Kc), 1.0))
Output:
symmetric: True
diagonal all 1: True
With $K$ in hand we can score any segment. Let's make that fast. 👉