The problem: nearest-neighbor search

Before the clever data structure, let's be precise about the problem it solves and why the obvious solution isn't good enough.

The setup

We have $n$ stored vectors (the "database"), each with $d$ numbers (dimensions), and a query vector $q$. We want the $k$ stored vectors closest to $q$ under some distance (Euclidean or cosine).

import numpy as np

data = np.random.randn(1000, 16)   # 1000 vectors, 16 dimensions each
query = np.random.randn(16)        # one query vector

The obvious solution: brute force

Compute the distance from the query to every stored vector, then take the smallest few:

def brute_force_knn(data, query, k=5):
    d = np.sum((data - query) ** 2, axis=1)   # squared distance to each vector
    idx = np.argsort(d)[:k]                    # indices of the k smallest
    return idx, d[idx]

idx, dist = brute_force_knn(data, query, k=3)
print("nearest 3 ids:", idx.tolist())

Output (ids depend on the random data):

nearest 3 ids: [742, 88, 915]

Brute force is exact and simple. So what's wrong with it?

Why brute force doesn't scale

It does $n$ distance computations per query. That's fine for 1,000 vectors, but real systems have very different numbers:

Vectors $n$Distances per queryAt 1M queries/day
1,0001,000manageable
1,000,0001,000,000painful
100,000,000100,000,000impossible in real time

Cost grows linearly with $n$ — written $O(n)$. Double the database, double the work, forever. A web-scale search or recommendation system can't afford to scan hundreds of millions of vectors for every single query.

We want something closer to $O(\log n)$: when the database grows 1000×, the work should grow by a small additive amount, not 1000×.

"Just build a tree" — and the curse of dimensionality

In low dimensions (2-D, 3-D) classic structures like k-d trees give that logarithmic speedup by recursively splitting space. The trouble is the curse of dimensionality: as $d$ grows into the hundreds (where embeddings live), space becomes so vast and counterintuitive that these partitioning trees degrade until they're no faster than brute force. Distances between points become eerily similar, and the "split space in half" idea stops pruning anything.

Intuition for the curse. In high dimensions, almost all the volume of a region sits near its boundary, and nearly every pair of random points is about the same distance apart. Methods that rely on cleanly carving space apart lose their grip.

The escape hatch: accept "approximate"

Here's the key insight that makes web-scale search possible:

We almost never need the mathematically perfect nearest neighbor. Returning the true top-10 about 95–99% of the time is more than enough for search, recommendations, and RAG — and allowing that tiny slack unlocks enormous speedups.

This is Approximate Nearest Neighbor (ANN) search. We trade a sliver of accuracy for orders-of-magnitude speed, and we measure the trade-off with two numbers:

  • Recall — of the true nearest neighbors, what fraction did we return? (1.0 = perfect.)
  • Speed / work — how many distance computations (or milliseconds) per query?

A good ANN method lets you dial the balance: spend more effort for higher recall, or less for more speed. HNSW does this with a single knob (ef), as we'll see.

Where HNSW fits

HNSW is the most popular ANN method today because it offers an excellent recall-vs-speed curve, handles high-dimensional embeddings well, and supports adding vectors incrementally (no full rebuild). It gets there by combining two ideas — small-world graphs and a layered hierarchy — which we cover next. 👉