The original paper: understand & reproduce

Like the KTS book, we built HNSW as if from thin air. It actually comes from a research paper, and this chapter tells that story: what the paper says, its lineage, how to read a paper like it, and an honest reflection on what we reproduced versus what production systems add.

The paper

Yu. A. Malkov, D. A. Yashunin. Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. IEEE TPAMI, 2018 (arXiv:1603.09320).

The paper's contribution is the hierarchy. Malkov and co-authors had earlier proposed flat Navigable Small World (NSW) graphs (2014); those worked but could get trapped and scaled worse on hard datasets. HNSW adds the skip-list-style layered structure and a neighbor-selection heuristic, which together give state-of-the-art recall/speed and robustness. The four algorithms in the paper map directly onto our code:

Paper	Our code
Algorithm 1 — `INSERT`	`HNSW.add`
Algorithm 2 — `SEARCH-LAYER`	`HNSW._search_layer`
Algorithm 3 — `K-NN-SEARCH`	`HNSW.search`
Algorithm 4 — `SELECT-NEIGHBORS-HEURISTIC`	`HNSW._select_neighbors`

Lineage

Small-world / navigable graphs — Kleinberg's work on navigable small-world networks, and the NSW papers (Malkov et al., 2012–2014).
Skip lists — Pugh, 1990: the probabilistic layered structure HNSW borrows for its hierarchy.
Probabilistic routing — the idea that greedy walks on the right graph reach targets in $O(\log n)$ hops.

Recognizing this lineage helps: when the HNSW paper is terse (e.g. on why the level distribution is exponential), the skip-list literature explains it.

How to read a paper like this

The same three-pass approach as the KTS book:

Skim (5 min). Abstract, figures (Figure 1 — the layered graph — is the whole idea), algorithm headers. Decide what you actually need.
Method, carefully. Read the four algorithms and the parameter discussion. Write down, in your own words: the data structure, the search routine, the insert routine, the heuristic, and the parameters (M, efConstruction, ef, mL).
Reproduce. Implement the algorithms and measure recall vs. a brute-force baseline — the only true test of understanding (our demo).

Tip: pseudocode in papers hides real decisions — heap orderings, the visited-set, tie-breaking, when to use ef=1 vs ef. Implementing forces you to resolve every one, which is exactly why reproduction teaches more than reading.

From paper to code: the process

Build the primitive first. SEARCH-LAYER (Algorithm 2) is called by everything else, so we implemented and tested it first.
Translate the data structures. "A graph per layer" → a list of dicts graph[level][node] = [neighbors]; the priority queues → Python heapq (with negated distances for the max-heap).
Handle the details the pseudocode glosses over — the visited set, the stopping rule (nearest candidate worse than the worst kept result), descending upper layers with ef=1 but the base layer with full ef.
Get the heuristic right. The keep-diverse rule (Algorithm 4) is what makes recall good; a naive "keep closest" version measurably underperforms.
Validate against ground truth. Recall vs. brute force across ef — and sanity that recall → 1.0 as ef grows confirms correctness.

Reflection: what we built vs. production HNSW

Our implementation is algorithmically faithful — same four routines, same parameters, correct recall curve. What we deliberately left out is engineering for speed and scale, which is most of what a production library is:

Aspect	This book	Production (FAISS, hnswlib, Lucene)
Language	pure Python	C/C++ with SIMD distance kernels
Distance	one pair at a time	batched, hardware-accelerated
Graph storage	dicts of lists	packed contiguous arrays
Build	single-threaded	multi-threaded
Memory	full float vectors	optional quantization (PQ/SQ)
Deletes / filters	not handled	soft-delete, filtered search

This is the same lesson as the KTS book: the algorithm is compact and learnable; the performance comes from careful systems work layered on top. A faithful from-scratch version is the right way to understand it, and a great baseline to validate an optimized version against.

A reflection on the parameters

The paper's biggest practical gift is which knobs exist and what they trade:

M — graph degree. Higher = better recall, more memory and slower build. Typical: 12–48.
efConstruction — build-time breadth. Higher = better graph, slower build. Typical: 100–500.
ef — query-time breadth. The runtime dial for recall vs. speed, tunable per query without rebuilding. Typical: 50–400.
mL — level normalizer, set to $1/\ln M$ in the paper; rarely changed.

Knowing these turns "HNSW is a black box" into "HNSW is three dials I understand".

See the References for the paper and its roots.

HNSW from Scratch