IVFPQ — partition + compress

IVF saves time (scan few regions); PQ saves memory (bytes per vector). IVFPQ does both at once, and it's the index that made billion-scale vector search practical. This is the FAISS workhorse (IndexIVFPQ).

The one new idea: encode residuals

Combining them naively (PQ-compress everything, then bucket by IVF) wastes accuracy. IVFPQ adds a refinement: instead of PQ-encoding each vector directly, it encodes the vector's residual — what's left after subtracting its region's centroid.

$$ \text{residual} = x - \text{centroid}(x). $$

Why this helps: within a region, all vectors are close to the centroid, so their residuals are small and similar. PQ quantizes small, concentrated residuals far more accurately than the spread-out raw vectors — better recall for the same bytes. The full reconstruction is centroid + decoded_residual.

build:   x ─► nearest centroid c ─► residual (x - c) ─► PQ code (m bytes)
         store (id, code) in cell c's inverted list

search:  q ─► nearest nprobe cells
         for each cell c:  build PQ table for residual (q - c)
                           score that cell's codes by ADC table lookups
         merge cells, keep top-k   (then optionally re-rank exactly)

The code

import numpy as np
from ivfpq import kmeans, _assign, PQ


class IVFPQ:
    def __init__(self, nlist=64, nprobe=8, m=8, ksub=256, seed=0):
        self.nlist, self.nprobe = nlist, nprobe
        self.m, self.ksub, self.seed = m, ksub, seed

    def train(self, X):
        self.centroids, a = kmeans(X, self.nlist, seed=self.seed)
        residuals = X - self.centroids[a]                  # encode residuals, not X
        self.pq = PQ(self.m, self.ksub, seed=self.seed + 1).train(residuals)
        return self

    def add(self, X):
        a = _assign(X, self.centroids)
        residuals = X - self.centroids[a]
        self.codes = self.pq.encode(residuals)             # m bytes per vector
        self.lists = [np.where(a == c)[0] for c in range(len(self.centroids))]
        return self

    def search(self, q, k=10, nprobe=None):
        nprobe = nprobe or self.nprobe
        dc = ((self.centroids - q) ** 2).sum(1)
        cells = np.argsort(dc)[:nprobe]
        all_d, all_id = [], []
        for c in cells:
            ids = self.lists[c]
            if ids.size == 0:
                continue
            res_q = q - self.centroids[c]                  # query residual for THIS cell
            d = self.pq.adc_distances(self.codes[ids], res_q)
            all_d.append(d); all_id.append(ids)
        d = np.concatenate(all_d); ids = np.concatenate(all_id)
        order = np.argsort(d)[:k]
        return ids[order], d[order]

Reading it

train learns the IVF centroids and trains PQ on the residuals.
add files each vector into its cell and stores only its m-byte residual code.
search probes the nprobe nearest cells; for each, it forms the query's residual relative to that cell's centroid, builds the PQ distance table, and scores the cell's codes by ADC. Results from all probed cells are merged.

How well it works

From the demo (10,000 vectors, m=16 → 16 bytes/vector):

IVFPQ  (IVF + PQ on residuals — partitioned and compressed):
   nprobe= 8: recall@10 raw=0.741  +rerank(top100)=1.000
   nprobe=16: recall@10 raw=0.741  +rerank(top100)=1.000

So IVFPQ is partitioned (scans ~6% of cells) and compressed (16–32× smaller), and with re-ranking reaches perfect recall on this data. That combination — small memory, fast search, high recall — is why it scales to billions of vectors.

The knobs, all together

Knob	Controls	Bigger means
`nlist`	number of IVF cells	finer partition; needs more `nprobe` for recall
`nprobe`	cells probed per query	higher recall, more time (runtime dial)
`m`	bytes per PQ code	higher recall, more memory
`ksub`	centroids per sub-quantizer	usually 256 (1 byte); rarely changed
re-rank size	exact-rescoring shortlist	higher recall, slight extra time

A typical recipe: pick nlist ≈ √n, train, then tune nprobe and the re-rank size at query time until recall is high enough — all without rebuilding the codes.

Next, the consolidated implementation. 👉