A worked demo: recall vs. memory

This demo puts all four index types side by side on the same data and measures the three quantities that matter: recall (accuracy), memory, and work (fraction of data scanned). The full script is code/demo.py.

The setup

10,000 vectors in 64 dimensions, drawn from 40 clusters (so the data has real structure, like embeddings do). 100 queries. We compute exact answers with flat_knn as ground truth, then score each method's recall against them.

import numpy as np
from ivfpq import flat_knn, IVF, PQ, IVFPQ
# ... build clustered data, pick 100 queries, compute exact top-10 ...

The result

Running python demo.py (takes ~40s — pure-Python k-means training) prints:

dataset: 10000 vectors x 64 dims; 100 queries; k=10

IVF  (partition into 128 cells, scan a few):
   nprobe= 1: recall@10=0.657   ~78 vectors scanned (0.8% of data)
   nprobe= 4: recall@10=0.994   ~312 vectors scanned (3.1% of data)
   nprobe= 8: recall@10=1.000   ~625 vectors scanned (6.2% of data)
   nprobe=16: recall@10=1.000   ~1250 vectors scanned (12.5% of data)

PQ  (compress each vector to m bytes):
   m= 8 (8B/vec, 64x smaller): recall@10 raw=0.292  +rerank(top100)=0.843
   m=16 (16B/vec, 32x smaller): recall@10 raw=0.386  +rerank(top100)=0.938

IVFPQ  (IVF + PQ on residuals — partitioned and compressed):
   nprobe= 8: recall@10 raw=0.741  +rerank(top100)=1.000
   nprobe=16: recall@10 raw=0.741  +rerank(top100)=1.000

flat (exact) memory: 5.1 MB for 10000 vectors
IVF keeps full vectors (no compression); PQ/IVFPQ shrink them 16-32x.

How to read it

IVF — near-exact, a fraction of the work. Scanning just 6% of the data (nprobe=8) gives perfect recall. IVF's only error is a true neighbor sitting in an un-probed cell; raising nprobe fixes it. IVF saves time but keeps full vectors (no memory savings).

PQ — huge memory savings, recovered by re-ranking. Raw PQ recall looks low (0.29–0.39) because the codes are lossy — but that's not how PQ is used. Shortlist the top-100 by cheap PQ distance, re-rank those exactly, and recall jumps to 0.84–0.94 at 32–64× less memory. The m knob trades bytes for accuracy.

IVFPQ — the best of both. Partitioned (scans ~6% of cells) and compressed (16–32× smaller), and with re-ranking it reaches recall 1.0. This is why IVFPQ scales to billions of vectors on a single machine.

A note on the compression factor. The demo's arrays are float64 (8 bytes), so it reports 32–64×. Real systems store float32 (4 bytes), so the same m=8 code is 32× smaller and m=16 is 16× — still enormous.

The mental model

            saves TIME?     saves MEMORY?     recall (with re-rank)
  flat          no              no                 1.000 (exact)
  IVF          YES              no                 ~1.0
  PQ            no             YES (16-64x)        high
  IVFPQ        YES             YES (16-64x)        high

Things to try

Raise nprobe and watch IVF/IVFPQ recall climb toward 1.0.
Increase m (8 → 16 → 32) for higher PQ recall at the cost of more bytes.
Increase the re-rank shortlist (100 → 200) to squeeze out more recall.
Grow N to 100,000 to feel how IVF's "% scanned" advantage compounds.

Now, where all this is actually used — including FAISS. 👉