IVFPQ — partition + compress
IVF saves time (scan few regions); PQ saves memory (bytes per vector).
IVFPQ does both at once, and it's the index that made billion-scale vector
search practical. This is the FAISS workhorse (IndexIVFPQ).
The one new idea: encode residuals
Combining them naively (PQ-compress everything, then bucket by IVF) wastes accuracy. IVFPQ adds a refinement: instead of PQ-encoding each vector directly, it encodes the vector's residual — what's left after subtracting its region's centroid.
$$ \text{residual} = x - \text{centroid}(x). $$
Why this helps: within a region, all vectors are close to the centroid, so their
residuals are small and similar. PQ quantizes small, concentrated residuals far
more accurately than the spread-out raw vectors — better recall for the same
bytes. The full reconstruction is centroid + decoded_residual.
build: x ─► nearest centroid c ─► residual (x - c) ─► PQ code (m bytes)
store (id, code) in cell c's inverted list
search: q ─► nearest nprobe cells
for each cell c: build PQ table for residual (q - c)
score that cell's codes by ADC table lookups
merge cells, keep top-k (then optionally re-rank exactly)
The code
import numpy as np
from ivfpq import kmeans, _assign, PQ
class IVFPQ:
def __init__(self, nlist=64, nprobe=8, m=8, ksub=256, seed=0):
self.nlist, self.nprobe = nlist, nprobe
self.m, self.ksub, self.seed = m, ksub, seed
def train(self, X):
self.centroids, a = kmeans(X, self.nlist, seed=self.seed)
residuals = X - self.centroids[a] # encode residuals, not X
self.pq = PQ(self.m, self.ksub, seed=self.seed + 1).train(residuals)
return self
def add(self, X):
a = _assign(X, self.centroids)
residuals = X - self.centroids[a]
self.codes = self.pq.encode(residuals) # m bytes per vector
self.lists = [np.where(a == c)[0] for c in range(len(self.centroids))]
return self
def search(self, q, k=10, nprobe=None):
nprobe = nprobe or self.nprobe
dc = ((self.centroids - q) ** 2).sum(1)
cells = np.argsort(dc)[:nprobe]
all_d, all_id = [], []
for c in cells:
ids = self.lists[c]
if ids.size == 0:
continue
res_q = q - self.centroids[c] # query residual for THIS cell
d = self.pq.adc_distances(self.codes[ids], res_q)
all_d.append(d); all_id.append(ids)
d = np.concatenate(all_d); ids = np.concatenate(all_id)
order = np.argsort(d)[:k]
return ids[order], d[order]
Reading it
trainlearns the IVF centroids and trains PQ on the residuals.addfiles each vector into its cell and stores only itsm-byte residual code.searchprobes thenprobenearest cells; for each, it forms the query's residual relative to that cell's centroid, builds the PQ distance table, and scores the cell's codes by ADC. Results from all probed cells are merged.
How well it works
From the demo (10,000 vectors, m=16 → 16 bytes/vector):
IVFPQ (IVF + PQ on residuals — partitioned and compressed):
nprobe= 8: recall@10 raw=0.741 +rerank(top100)=1.000
nprobe=16: recall@10 raw=0.741 +rerank(top100)=1.000
So IVFPQ is partitioned (scans ~6% of cells) and compressed (16–32× smaller), and with re-ranking reaches perfect recall on this data. That combination — small memory, fast search, high recall — is why it scales to billions of vectors.
The knobs, all together
| Knob | Controls | Bigger means |
|---|---|---|
nlist | number of IVF cells | finer partition; needs more nprobe for recall |
nprobe | cells probed per query | higher recall, more time (runtime dial) |
m | bytes per PQ code | higher recall, more memory |
ksub | centroids per sub-quantizer | usually 256 (1 byte); rarely changed |
| re-rank size | exact-rescoring shortlist | higher recall, slight extra time |
A typical recipe: pick nlist ≈ √n, train, then tune nprobe and the re-rank
size at query time until recall is high enough — all without rebuilding the codes.
Next, the consolidated implementation. 👉