How HNSW works (the big picture)

Before building each piece, let's see the whole thing at a glance. HNSW is a stack of graph layers; you search top-down and insert top-down.

The structure

   layer 2     (•)                         <- a few nodes, long-range links
                |  \
   layer 1   (•)–(•)–––––(•)               <- more nodes
              |   |       |
   layer 0  (•)-(•)-(•)-(•)-(•)-(•)-(•)     <- EVERY node, dense links
  • Every vector lives in layer 0. Each vector is also present in some number of higher layers, chosen randomly at insertion time.
  • Higher layers are exponentially sparser, so they act as express lanes.
  • There is a single entry point: a node in the top layer where every search begins.

The layers really are a pyramid

Each node is assigned a maximum layer drawn so that the chance of reaching layer $\ell$ falls off geometrically (probability $\sim 1/M$ per extra layer, where $M$ is the links-per-node setting). Building an index of 2000 random vectors with M=16 gives:

layer 0: 2000 nodes
layer 1: 136 nodes
layer 2: 9 nodes
entry point id: 13   top layer: 2

Layer 0 has everything; layer 1 has ~1/16 as many; layer 2 just a handful. That steep pyramid is what keeps the top-down search short.

Searching (find the nearest neighbors)

1. Start at the entry point in the TOP layer.
2. Greedily hop toward the query within the current layer (single best path).
3. Drop down one layer, using where you landed as the new start.
4. Repeat until layer 0.
5. At layer 0, do a WIDER search (keep the best `ef` candidates), and
   return the closest `k`.

The top layers get you into the right neighborhood with a few big jumps; the wide search at the bottom nails the precise answer.

Inserting a new vector

1. Pick a random maximum layer L for the new node (mostly 0; rarely high).
2. From the global entry point, greedily descend the layers ABOVE L
   (just like searching) to find a good entry near the new node.
3. For each layer from L down to 0:
     a. Search that layer to collect nearby candidates.
     b. Choose M of them as neighbors (using a diversity heuristic).
     c. Add links both ways (new node <-> neighbors).
     d. If a neighbor now has too many links, prune it back.
4. If L is higher than the current top, the new node becomes the entry point.

Insertion reuses the search routine — that's why we build search first.

The knobs

Three parameters control the whole structure; we'll meet each in context:

ParameterMeaningEffect
Mlinks kept per node per layerbigger = better recall, more memory
efConstructionsearch breadth while buildingbigger = better graph, slower build
efsearch breadth at query timebigger = higher recall, slower query

ef is the runtime dial: turn it up for accuracy, down for speed — without rebuilding the index.

The plan

We'll build HNSW bottom-up over the next four chapters:

  1. Distances & the data structure — what we store.
  2. Greedy search in one layer — the core routine.
  3. The hierarchy — searching across layers.
  4. Insertion — building the graph, with the neighbor heuristic.

Let's start with what an HNSW index actually holds in memory. 👉