Step 3 — The hierarchy
One-layer greedy search is the engine. The hierarchy is the gearbox: it lets a few coarse layers carry the query most of the way before the fine layer does the precise work.
Why layers help
In a single flat graph, greedy search from a random start can take many hops to cross the whole dataset. With layers:
- Top layers are sparse, so their links span long distances — each hop covers a lot of ground. A few hops get you roughly to the query's region.
- Lower layers are denser, so hops are short and precise — perfect for refining once you're already close.
It's the highway → regional road → street progression from the introduction. The result is roughly logarithmic search cost instead of linear.
Search across layers
The multi-layer search is short, because it just calls _search_layer repeatedly:
def search(self, query, k=5, ef=None):
"""Approximate k nearest neighbors as [(id, distance), ...]."""
if self.entry_point is None:
return []
ef = ef or max(self.ef, k)
q = self._prepare(query) # normalize if cosine
ep = [self.entry_point]
for l in range(self.max_level, 0, -1): # top layers: narrow, fast descent
W = self._search_layer(q, ep, 1, l) # ef=1: just the single best path
ep = [min(W)[1]] # carry the closest node downward
W = self._search_layer(q, ep, ef, 0) # base layer: WIDE search (ef)
W.sort()
return [(n, d) for d, n in W[:k]]
The two phases
- Descend (layers
max_level… 1). Here we search withef = 1— pure greedy, single best path. We don't need breadth yet; we just want to arrive near the query quickly. After each layer we take the closest node found and use it as the entry point for the layer below. - Refine (layer 0). Now we search with the full
efbeam. The base layer has every vector and dense links, so this wide search produces the accurate top-k. We sort and return the closestk.
_prepare simply normalizes the query to unit length when using cosine (so the
dot-product distance is correct).
A mental trace
top layer: start at entry point ── greedy ──► node A (1 big hop)
│ drop down
mid layer: start at A ── greedy ──► node B (a few hops)
│ drop down
layer 0: start at B ── WIDE beam search (ef) ──► best k near the query
By the time we reach layer 0 we're already in the right neighborhood, so the expensive wide search only explores a small local region — not the whole dataset.
Why descend with ef = 1?
It's a deliberate efficiency choice. The upper layers exist only to approach
the target; precision there is wasted effort because the real answers all live at
layer 0 anyway. Spending the beam budget (ef) only at the bottom is what keeps
HNSW fast while staying accurate.
We can now search a fully built hierarchy. The last missing piece is how that hierarchy gets built — insertion. 👉