Similarity & distance (the recipe chapter)

Once everything is a vector, "are these two things alike?" becomes "are these two vectors close?" This single question powers search, recommendation, clustering, deduplication, retrieval (RAG), face matching, and the attention inside transformers. This chapter is your recipe card for every common similarity and distance — with sample input, output, and the traps that bite people.

All recipes live in code/recipes.py and print exactly what's shown here.

Similarity vs. distance: opposite directions

Don't be confused. A similarity is bigger when things are more alike (cosine = 1 means identical direction). A distance is smaller when things are more alike (Euclidean = 0 means identical point). They point in opposite directions. You convert between them with tricks like distance = 1 - similarity (for cosine) or similarity = exp(-distance) (the RBF kernel). Always know which one a function returns before you sort by it — sorting the wrong way silently returns the least similar items.

Our cast of three example vectors:

import numpy as np
a = np.array([1.0, 0.0])   # points right
b = np.array([2.0, 0.0])   # same direction, twice as long
c = np.array([0.0, 1.0])   # perpendicular to a

Recipe 1 — Dot product (linear similarity)

The raw inner product: multiply elementwise, sum. It grows with both alignment and magnitude.

$$ a \cdot b = \sum_i a_i b_i $$

print(np.dot(a, b))    # 1*2 + 0*0

Output:

2.0

Use it when magnitude is meaningful and your vectors are already normalized (then dot is cosine). It's the cheapest similarity, and the one transformers use internally (Q·K). Watch out: a longer vector scores higher just for being long, even if it's off-topic.

Recipe 2 — Cosine similarity (direction only)

Normalize away length, keep only the angle. Range $[-1, 1]$: 1 = same direction, 0 = perpendicular, −1 = opposite.

$$ \cos(a, b) = \frac{a \cdot b}{\lVert a\rVert , \lVert b\rVert} $$

def cosine_sim(a, b, eps=1e-12):
    a = np.asarray(a, float); b = np.asarray(b, float)
    return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b) + eps))

print("cos(a, b):", round(cosine_sim(a, b), 3))   # same direction
print("cos(a, c):", round(cosine_sim(a, c), 3))   # perpendicular

Output:

cos(a, b): 1.0
cos(a, c): 0.0

a and b point the same way → cosine 1.0, even though b is twice as long. a and c are perpendicular → 0.0. Cosine is the default for text/semantic embeddings, because there a document's topic (direction) matters, not its length.

Don't be confused: cosine vs. dot product. They agree only when vectors are unit length. When they're not, they can disagree badly. Here a generic-but- long vector beats a short-but-on-topic one under dot, while cosine judges by topic:
doc   = np.array([1., 1., 0.])
short = np.array([1., 0., 0.])   # same topic, short
longd = np.array([5., 5., 5.])   # generic, long
print("dot:", doc @ short, doc @ longd)            # 1.0  10.0  -> long wins
print("cos:", round(cosine_sim(doc, short),3), round(cosine_sim(doc, longd),3))
dot: 1.0 10.0
cos: 0.707 0.816
The fix used everywhere: L2-normalize your vectors once up front, then a plain dot product equals cosine — fast and unambiguous. That's why vector databases store normalized vectors and call it "inner-product" search.

Recipe 3 — Euclidean (L2) distance

Ordinary straight-line distance. A distance (0 = identical).

$$ \lVert a - b\rVert_2 = \sqrt{\sum_i (a_i - b_i)^2} $$

print(np.sqrt(np.sum((a - b) ** 2)))

Output:

1.0

Use it for dense numeric features on a comparable scale (after standardizing!). It's the metric behind k-means and most geometric clustering.

Don't be confused: cosine vs. Euclidean. They can rank things differently. Two vectors pointing the same way but at different lengths are cosine- identical yet Euclidean-far:
u = np.array([1., 2.]); v = np.array([10., 20.])    # identical direction
print("cosine:", round(cosine_sim(u, v), 3))         # 1.0
print("euclid:", round(float(np.sqrt(np.sum((u-v)**2))), 3))  # 20.125
cosine: 1.0
euclid: 20.125
Choose by what "similar" means for your data: same proportions → cosine; same location → Euclidean. Fun fact: for unit-length vectors the two are equivalent — $\lVert a-b\rVert^2 = 2(1 - \cos(a,b))$ — so normalizing collapses the distinction.

Recipe 4 — Manhattan (L1) distance

Sum of absolute differences — "city-block" distance, as if you can only move along grid streets.

$$ \lVert a - b\rVert_1 = \sum_i |a_i - b_i| $$

print(np.sum(np.abs(a - c)))     # |1-0| + |0-1|

Output:

2.0

Use it when you want robustness to outliers (L1 punishes a single huge coordinate difference less than L2's squaring does), or in high dimensions where it sometimes behaves better.

Recipe 5 — RBF / Gaussian kernel (distance → similarity)

The Radial Basis Function kernel turns a distance into a similarity that decays smoothly: 1 when identical, fading toward 0 as points separate.

$$ k(a, b) = \exp!\left(-\gamma , \lVert a - b\rVert^2\right) $$

gamma controls how fast similarity fades — it's the kernel's "reach."

def rbf(a, b, gamma=1.0):
    d2 = np.sum((np.asarray(a, float) - np.asarray(b, float)) ** 2)
    return float(np.exp(-gamma * d2))

for g in [0.1, 1.0, 10.0]:
    near = rbf([0,0], [0.5, 0], gamma=g)   # squared dist 0.25
    far  = rbf([0,0], [2.0, 0], gamma=g)   # squared dist 4.0
    print(f"gamma={g:<4}  near={near:.4f}  far={far:.4f}")

Output:

gamma=0.1   near=0.9753  far=0.6703
gamma=1.0   near=0.7788  far=0.0183
gamma=10.0  near=0.0821  far=0.0000

Small gamma = wide, gentle reach (far points still count). Large gamma = narrow, picky reach (only near-identical points count). The RBF kernel is the heart of the RBF-SVM and a building block in many "kernel methods" — and gamma is the single most important knob to tune. We meet kernels properly in Chapter 5.

Recipe 6 — Jaccard (set overlap)

For sets or binary vectors (tags, shingles, sparse boolean features): the size of the intersection over the size of the union.

$$ J(A, B) = \frac{|A \cap B|}{|A \cup B|} $$

def jaccard(a, b):
    a, b = set(a), set(b)
    return len(a & b) / len(a | b) if (a | b) else 1.0

print(jaccard([1, 2, 3], [2, 3, 4]))    # overlap {2,3}, union {1,2,3,4}

Output:

0.5

Use it for deduplication, plagiarism/near-duplicate detection (with MinHash), and any "how much do these two tag sets overlap?" question.

The all-pairs versions (what you actually ship)

In production you rarely compare two vectors — you compare a query against thousands. Do it in one matrix operation, not a Python loop:

def normalize_rows(X, eps=1e-12):
    return X / (np.linalg.norm(X, axis=1, keepdims=True) + eps)

def cosine_matrix(X):           # all-pairs cosine, shape (n, n)
    Xn = normalize_rows(X)
    return Xn @ Xn.T

X = np.array([[1., 0.], [0.9, 0.1], [0., 1.]])
print(np.round(cosine_matrix(X), 3))

Output:

[[1.    0.994 0.   ]
 [0.994 1.    0.11 ]
 [0.    0.11  1.   ]]

Row 0 vs row 1 = 0.994 (nearly identical), row 0 vs row 2 = 0.0 (perpendicular). Normalize once, multiply, done — this Xn @ Xn.T is the same operation that scores attention in a transformer. For squared Euclidean all-pairs, use the identity $\lVert a-b\rVert^2 = \lVert a\rVert^2 + \lVert b\rVert^2 - 2a\cdot b$ (the pairwise_sq_dists recipe) — never a double loop.

Quick chooser

You want…	Use	Returns
Text / semantic similarity	cosine (or dot on normalized vectors)	similarity ↑
Geometric closeness, clustering	Euclidean (after standardizing)	distance ↓
Robustness to outliers	Manhattan (L1)	distance ↓
Smooth, tunable similarity / SVM	RBF kernel	similarity ↑
Overlap of tags / sets	Jaccard	similarity ↑
Fast many-vs-many	normalize, then `X @ Q.T`	similarity ↑

The takeaway

Know whether you hold a similarity or a distance; cosine compares direction, dot adds magnitude, Euclidean compares position, RBF turns distance into tunable similarity, Jaccard compares sets. Normalize first and dot = cosine. These six recipes recur in every system in this series. The word "kernel" snuck in twice already — let's pin down all five things it means. 👉

AI Foundations in Depth