Step 3 — The hierarchy

One-layer greedy search is the engine. The hierarchy is the gearbox: it lets a few coarse layers carry the query most of the way before the fine layer does the precise work.

Why layers help

In a single flat graph, greedy search from a random start can take many hops to cross the whole dataset. With layers:

Top layers are sparse, so their links span long distances — each hop covers a lot of ground. A few hops get you roughly to the query's region.
Lower layers are denser, so hops are short and precise — perfect for refining once you're already close.

It's the highway → regional road → street progression from the introduction. The result is roughly logarithmic search cost instead of linear.

Search across layers

The multi-layer search is short, because it just calls _search_layer repeatedly:

def search(self, query, k=5, ef=None):
    """Approximate k nearest neighbors as [(id, distance), ...]."""
    if self.entry_point is None:
        return []
    ef = ef or max(self.ef, k)
    q = self._prepare(query)                 # normalize if cosine

    ep = [self.entry_point]
    for l in range(self.max_level, 0, -1):   # top layers: narrow, fast descent
        W = self._search_layer(q, ep, 1, l)  # ef=1: just the single best path
        ep = [min(W)[1]]                     # carry the closest node downward
    W = self._search_layer(q, ep, ef, 0)     # base layer: WIDE search (ef)
    W.sort()
    return [(n, d) for d, n in W[:k]]

The two phases

Descend (layers max_level … 1). Here we search with ef = 1 — pure greedy, single best path. We don't need breadth yet; we just want to arrive near the query quickly. After each layer we take the closest node found and use it as the entry point for the layer below.
Refine (layer 0). Now we search with the full ef beam. The base layer has every vector and dense links, so this wide search produces the accurate top-k. We sort and return the closest k.

_prepare simply normalizes the query to unit length when using cosine (so the dot-product distance is correct).

A mental trace

top layer:   start at entry point ── greedy ──► node A          (1 big hop)
                                         │ drop down
mid layer:   start at A ── greedy ──► node B                    (a few hops)
                                         │ drop down
layer 0:     start at B ── WIDE beam search (ef) ──► best k near the query

By the time we reach layer 0 we're already in the right neighborhood, so the expensive wide search only explores a small local region — not the whole dataset.

It's a deliberate efficiency choice. The upper layers exist only to approach the target; precision there is wasted effort because the real answers all live at layer 0 anyway. Spending the beam budget (ef) only at the bottom is what keeps HNSW fast while staying accurate.

We can now search a fully built hierarchy. The last missing piece is how that hierarchy gets built — insertion. 👉

HNSW from Scratch

Step 3 — The hierarchy

Why layers help

Search across layers

The two phases

A mental trace

Why descend with `ef = 1`?

HNSW from Scratch

Step 3 — The hierarchy

Why layers help

Search across layers

The two phases

A mental trace

Why descend with ef = 1?

Why descend with `ef = 1`?