Step 4 — Insertion & neighbor selection

Building the index means inserting vectors one at a time. Insertion reuses the search routine, then wires the new node into each layer using a clever neighbor-selection heuristic. This is the subtlest part of HNSW — we'll take it slowly.

Choosing the new node's top layer

Each new node gets a random maximum layer. The distribution is designed so layers thin out geometrically (most nodes only at layer 0, very few up high). The formula:

$$ \ell = \big\lfloor -\ln(U)\cdot m_L \big\rfloor, \qquad m_L = \frac{1}{\ln M}. $$

Here $U$ is a uniform random number in $(0,1]$, $\ln$ is the natural logarithm, and $\lfloor\cdot\rfloor$ rounds down. The $-\ln(U)$ part produces mostly small numbers and occasionally large ones; multiplying by $m_L = 1/\ln M$ tunes how fast the layers thin. With this rule, the probability of reaching each higher layer drops by a factor of about $M$.

import math, random

def random_level(M, rng):
    mL = 1.0 / math.log(M)
    return int(-math.log(rng.random()) * mL)

Running this many times for M=16 gives mostly 0, sometimes 1, rarely 2+ — exactly the pyramid we saw earlier (2000 → 136 → 9 nodes).

The insertion algorithm

def add(self, vector):
    v = self._prepare(vector)                       # normalize if cosine
    node = len(self.data)
    self.data.append(v)

    level = int(-math.log(self.rng.random()) * self.mL)   # the new node's top layer
    while len(self.graph) <= level:                 # grow the layer stack if needed
        self.graph.append({})
    for l in range(level + 1):
        self.graph[l].setdefault(node, [])

    if self.entry_point is None:                    # the very first node
        self.entry_point = node
        self.max_level = level
        return node

    ep = [self.entry_point]
    L = self.max_level

    # Phase 1: greedily descend the layers ABOVE the new node's level (ef=1),
    # exactly like a search, to arrive near where the node belongs.
    for l in range(L, level, -1):
        W = self._search_layer(v, ep, 1, l)
        ep = [min(W)[1]]

    # Phase 2: from the node's level down to 0, connect it into each layer.
    for l in range(min(L, level), -1, -1):
        W = self._search_layer(v, ep, self.ef_construction, l)   # gather candidates
        neighbors = self._select_neighbors(v, W, self.M)         # pick diverse M
        self.graph[l][node] = list(neighbors)

        Mmax = self.Mmax0 if l == 0 else self.Mmax
        for nb in neighbors:                        # links go both ways
            self.graph[l].setdefault(nb, []).append(node)
            if len(self.graph[l][nb]) > Mmax:       # neighbor now over capacity?
                nb_vec = self.data[nb]
                cand = [(self._dist(nb_vec, self.data[x]), x)
                        for x in self.graph[l][nb]]
                self.graph[l][nb] = self._select_neighbors(nb_vec, cand, Mmax)
        ep = [n for _, n in W]                       # carry candidates downward

    if level > self.max_level:                       # new tallest node -> new entry
        self.max_level = level
        self.entry_point = node
    return node

Walking through it

Pick a level and make sure the graph has that many layers.
First node? It just becomes the entry point — nothing to link to.
Phase 1 (approach): descend the layers above the new node's top level with ef=1, identical to searching, to find a good starting point near the new node.
Phase 2 (connect): for each layer the node belongs to, from its top level down to 0:
- search that layer with breadth efConstruction to collect candidate neighbors,
- choose M of them with the heuristic (below),
- add bidirectional links,
- if a neighbor now exceeds its cap (Mmax, or Mmax0=2M at layer 0), prune it back down by re-running the heuristic on its links.
New top? If the node's level beats the current maximum, it becomes the new global entry point.

efConstruction is the build-time beam width: larger means we consider more candidates per insertion, producing a higher-quality graph (better recall later) at the cost of slower building.

The neighbor-selection heuristic

The obvious choice — "just keep the M closest candidates" — builds a worse graph. It tends to link a node only to others in the same tight cluster, leaving few bridges between regions, so greedy search gets stuck.

HNSW uses a diversity heuristic (Algorithm 4 in the paper): keep a candidate only if it's closer to the new node than to any neighbor already chosen. This spreads links out in different directions and keeps the graph well-connected.

def _select_neighbors(self, base_vec, candidates, M):
    """Keep diverse neighbors, not just the closest ones."""
    result = []
    for d_e, e in sorted(candidates):              # consider nearest first
        keep = True
        for r in result:
            # is e closer to an already-picked neighbor than to base? -> redundant
            if self._dist(self.data[e], self.data[r]) < d_e:
                keep = False
                break
        if keep:
            result.append(e)
        if len(result) >= M:
            break
    return result

Why diversity matters (a picture)

keep-closest:                 diversity heuristic:
  new                            new
  /|\                           / | \
 o o o   (all in one clump)    o  |  o   (links fan out in different directions)
                                  o

With links fanning out, a greedy walk arriving from any direction can find its way through the node — that's the connectivity that makes search reliable.

The build-time knobs, recapped

Knob	When	Bigger means
`M`	structure	more links/node → better recall, more memory
`efConstruction`	building	better-quality graph, slower build

That completes all four pieces. Next we assemble them into the full, runnable implementation. 👉

HNSW from Scratch