A worked demo: recall vs. speed
Let's measure the two things that matter for an approximate method: how accurate
is it (recall), and how much work does it save (distance computations). The
full script is code/demo.py.
The setup
We build an index over 2000 random 32-dimensional vectors, then run 200 queries and compare HNSW's results to the exact answers from brute force.
import time
import numpy as np
from hnsw import HNSW, brute_force_knn
rng = np.random.default_rng(0)
N, D, Q, K = 2000, 32, 200, 10
data = rng.normal(size=(N, D))
queries = rng.normal(size=(Q, D))
index = HNSW(dim=D, M=16, ef_construction=200, distance="euclidean", seed=1)
for v in data:
index.add(v)
- Recall@10 = of the true 10 nearest neighbors, what fraction did HNSW return?
- Distances/query = how many distance computations HNSW used. Brute force
always uses
N = 2000(it checks everything).
Measuring the trade-off
We sweep the search-breadth knob ef and, for each value, record recall and the
number of distance computations (using index.dist_count):
exact = [set(i for i, _ in brute_force_knn(data, q, k=K)) for q in queries]
for ef in [10, 20, 50, 100, 200]:
hits, total_dists = 0, 0
for q, ex in zip(queries, exact):
index.dist_count = 0
approx = set(i for i, _ in index.search(q, k=K, ef=ef))
total_dists += index.dist_count
hits += len(approx & ex)
print(ef, hits / (Q * K), total_dists / Q)
The result
Running python demo.py produces:
dataset: 2000 points in 32 dimensions; 200 queries; k=10
built HNSW index in 8.37s (top layer = 2)
ef recall@10 dists/query vs brute
--------------------------------------------
10 0.758 278 7.2x
20 0.898 418 4.8x
50 0.986 756 2.6x
100 0.999 1129 1.8x
200 1.000 1533 1.3x
Brute force examines all 2000 points per query.
How to read this
efis the dial. Turning it up trades effort for accuracy: atef=10we examine ~278 points (7.2× fewer than brute force) for 76% recall; atef=50we hit 98.6% recall while still examining ~⅓ of the data; atef=200we get perfect recall. You choose the point on this curve your application needs — without rebuilding the index.- High recall is cheap. ~99% recall (
ef=100) is the sweet spot for most search/RAG systems, and it's reached well before examining everything.
"But the speedup looks small!"
At only 2000 points the vs brute column is modest — and a wall-clock timer would
even show this pure-Python HNSW losing to NumPy's vectorized brute force. That's
expected and important to understand:
- NumPy brute force is vectorized C; our HNSW is a pure-Python graph walk with
per-node interpreter overhead. At small
N, that overhead dominates. - The win is algorithmic and grows with
N. Brute force is $O(N)$ — its work doubles when the data doubles, forever. HNSW's search cost grows roughly $O(\log N)$. The "distances/query" gap you see atN=2000becomes a chasm atN=1,000,000: brute force does a million distance computations per query while HNSW still does only a few thousand.
That's why production systems with millions or billions of vectors use HNSW: it's the difference between a query taking milliseconds versus seconds. Our distance-count comparison shows the real algorithmic advantage that wall-clock timing on tiny data would hide.
Things to try
- Raise
Nto 20,000 and watch thevs brutecolumn widen (the build is pure Python, so be patient). - Increase
M(e.g. 32): higher recall at the sameef, more memory. - Increase
efConstruction: a better graph (higher recall) but slower build. - Switch to
distance="cosine"for embedding-style data.
Now that it works, let's see where HNSW is actually used. 👉