A worked demo: recall vs. memory
This demo puts all four index types side by side on the same data and measures the
three quantities that matter: recall (accuracy), memory, and work
(fraction of data scanned). The full script is code/demo.py.
The setup
10,000 vectors in 64 dimensions, drawn from 40 clusters (so the data has real
structure, like embeddings do). 100 queries. We compute exact answers with
flat_knn as ground truth, then score each method's recall against them.
import numpy as np
from ivfpq import flat_knn, IVF, PQ, IVFPQ
# ... build clustered data, pick 100 queries, compute exact top-10 ...
The result
Running python demo.py (takes ~40s — pure-Python k-means training) prints:
dataset: 10000 vectors x 64 dims; 100 queries; k=10
IVF (partition into 128 cells, scan a few):
nprobe= 1: recall@10=0.657 ~78 vectors scanned (0.8% of data)
nprobe= 4: recall@10=0.994 ~312 vectors scanned (3.1% of data)
nprobe= 8: recall@10=1.000 ~625 vectors scanned (6.2% of data)
nprobe=16: recall@10=1.000 ~1250 vectors scanned (12.5% of data)
PQ (compress each vector to m bytes):
m= 8 (8B/vec, 64x smaller): recall@10 raw=0.292 +rerank(top100)=0.843
m=16 (16B/vec, 32x smaller): recall@10 raw=0.386 +rerank(top100)=0.938
IVFPQ (IVF + PQ on residuals — partitioned and compressed):
nprobe= 8: recall@10 raw=0.741 +rerank(top100)=1.000
nprobe=16: recall@10 raw=0.741 +rerank(top100)=1.000
flat (exact) memory: 5.1 MB for 10000 vectors
IVF keeps full vectors (no compression); PQ/IVFPQ shrink them 16-32x.
How to read it
IVF — near-exact, a fraction of the work. Scanning just 6% of the data
(nprobe=8) gives perfect recall. IVF's only error is a true neighbor sitting
in an un-probed cell; raising nprobe fixes it. IVF saves time but keeps full
vectors (no memory savings).
PQ — huge memory savings, recovered by re-ranking. Raw PQ recall looks low
(0.29–0.39) because the codes are lossy — but that's not how PQ is used. Shortlist
the top-100 by cheap PQ distance, re-rank those exactly, and recall jumps to
0.84–0.94 at 32–64× less memory. The m knob trades bytes for accuracy.
IVFPQ — the best of both. Partitioned (scans ~6% of cells) and compressed (16–32× smaller), and with re-ranking it reaches recall 1.0. This is why IVFPQ scales to billions of vectors on a single machine.
A note on the compression factor. The demo's arrays are float64 (8 bytes), so it reports 32–64×. Real systems store float32 (4 bytes), so the same
m=8code is 32× smaller andm=16is 16× — still enormous.
The mental model
saves TIME? saves MEMORY? recall (with re-rank)
flat no no 1.000 (exact)
IVF YES no ~1.0
PQ no YES (16-64x) high
IVFPQ YES YES (16-64x) high
Things to try
- Raise
nprobeand watch IVF/IVFPQ recall climb toward 1.0. - Increase
m(8 → 16 → 32) for higher PQ recall at the cost of more bytes. - Increase the re-rank shortlist (100 → 200) to squeeze out more recall.
- Grow
Nto 100,000 to feel how IVF's "% scanned" advantage compounds.
Now, where all this is actually used — including FAISS. 👉