Introduction

No background required. This book assumes you know nothing about programming, AI, or advanced math. Every concept, line of code, and symbol is explained, and every code example is followed by the exact output it produces. If you've never written Python, read the 5-minute primer first.

The one-sentence idea

HNSW (Hierarchical Navigable Small World) is a clever data structure that finds the items most similar to a query — out of millions — in a tiny fraction of the time it would take to compare against every item.

Why you've already used it

Every time you see "similar products", "related articles", semantic search, a chatbot that "remembers" documents (RAG), or face/image search, there's a good chance HNSW (or something like it) is doing the heavy lifting behind the scenes. It's the default index in FAISS, OpenSearch / Elasticsearch, Lucene, and most vector databases (Pinecone, Weaviate, Qdrant, Milvus…).

An everyday analogy

Imagine finding the house nearest to you in a huge city. The slow way: measure the distance to every house. The HNSW way is like using a layered map:

  • A highway map (top layer) has only a few major cities — you hop between them to get roughly close, fast.
  • A regional map (middle layer) has more towns — you refine your position.
  • A street map (bottom layer) has every house — you do the final fine search only in the small area you've zoomed into.

By zooming from coarse to fine, you reach the right neighborhood in a few big jumps, then look carefully only nearby — instead of checking the whole city. That layered zoom is exactly what HNSW does, and the layers are what the "Hierarchical" in its name refers to.

What "similar" means: vectors & embeddings

Computers compare things by turning them into lists of numbers called vectors (or, for text/images/audio, embeddings). Similar items get nearby vectors. So "find similar items" becomes a geometry problem: find the vectors closest to the query vector. That's the nearest-neighbor problem, and HNSW solves it approximately but extremely fast. (The primer explains vectors and embeddings gently.)

What you'll build

A complete, working HNSW index in one readable file, hnsw.py, plus:

The two ingredients

HNSW combines two older ideas, which are the heart of this book:

  1. Small-world graphs — connect each item to a handful of neighbors so that any item is reachable from any other in just a few hops (like "six degrees of separation"). You can then walk the graph greedily toward a query.
  2. A hierarchy of layers (inspired by skip lists) — sparse at the top for big jumps, dense at the bottom for precision.

We'll meet both in Chapter 2. First, let's understand the problem they solve. 👉