Vector databases

Modern AI runs on embeddings — vectors where geometric closeness means semantic similarity. Searching millions of them fast, with metadata and persistence, is the job of a vector database, and it's the storage layer under semantic search, recommendations, and RAG (next chapter). We build the core in NumPy so the magic is obvious, then map it to the real tools.

Setup: the in-memory store runs on NumPy only and the output below is real. Production uses Qdrant / Chroma / pgvector / Pinecone.

What a vector database actually does

Three operations, and the third is the whole point:

Store vectors, each with an id and metadata (the source text, a URL, a timestamp).
Filter by metadata (source = "docs", date > last_week).
Search: given a query vector, return the top-k most similar stored vectors — fast, even over millions.

The similarity is cosine (or dot product on normalized vectors) — exactly the recipe from the foundations book. Here's the entire core, from code/rag/vectorstore.py:

import numpy as np

class VectorStore:
    def __init__(self):
        self.ids, self.vecs, self.meta = [], [], []

    def add(self, id, vector, metadata=None):
        self.ids.append(id); self.vecs.append(np.asarray(vector, float))
        self.meta.append(metadata or {})

    def search(self, query_vec, k=3):
        M = np.vstack(self.vecs)
        sims = M @ np.asarray(query_vec, float)   # all-pairs cosine in one matmul
        order = np.argsort(-sims)[:k]             # top-k, best first
        return [{"id": self.ids[i], "score": round(float(sims[i]), 3),
                 "metadata": self.meta[i]} for i in order]

That M @ query_vec — one matrix-vector product scoring every stored vector at once — is vector search. Everything a real vector DB adds is performance and durability around this idea.

Embeddings: where the vectors come from

A vector store is only as good as its embeddings. You don't compute these by hand — you call an embedding model that maps text to a learned vector. For a runnable, deterministic demo we use a tiny interpretable "topic embedder" (3 dimensions: pets / finance / tech); in production you swap in a real model (Voyage AI, OpenAI, or a local sentence-transformer) that produces 768–1536 dimensions of learned meaning.

$ python rag/vectorstore.py

Output:

query 'loyal pets to own' -> [{'id': 'd1', 'score': 1.0, 'metadata': {'src': 'pets.md'}}, {'id': 'd2', 'score': 0.0, 'metadata': {'src': 'news.md'}}]
query 'python programming' -> [{'id': 'd3', 'score': 1.0, 'metadata': {'src': 'tech.md'}}, {'id': 'd1', 'score': 0.0, 'metadata': {'src': 'pets.md'}}]

The pets query retrieved the pets document (score 1.0) and correctly ranked the finance doc last (0.0); the programming query found the tech doc. Same matmul, real ranking — with learned embeddings instead of our toy ones, "loyal pets" would also match "faithful companion animals" with no shared words. That semantic matching is why vector search beats keyword search.

Why you need a real vector database

Our NumPy store works for thousands of vectors. It falls over at scale, and that's exactly the gap the tools fill:

Our store	A vector database adds
`M @ q` scans every vector (O(n))	ANN indexes (HNSW, IVF-PQ) → sub-linear search over millions
Lives in RAM, lost on restart	Persistence to disk
No filtering	Metadata filters combined with vector search
One process	Scaling, sharding, replication

Don't be confused: exact vs. approximate search. Our argsort does exact nearest-neighbor — perfect results, but O(n) per query. Vector databases use approximate nearest-neighbor (ANN) indexes that trade a tiny bit of recall for massive speed, finding the top-k in roughly O(log n). For millions of vectors, approximate is the only option — and the HNSW and IVF-PQ sister books build those exact indexes from scratch. A vector DB is essentially our VectorStore with an ANN index, persistence, and filtering bolted on.

The landscape

Tool	What it is
Chroma	dead-simple, embedded; great for prototypes and local RAG
Qdrant	fast Rust engine, rich filtering; popular self-hosted choice
pgvector	a Postgres extension — vectors in your existing database
Pinecone / Weaviate / Milvus	managed/scalable vector DBs for production
FAISS	Meta's library — the index, no server (the HNSW/IVF-PQ algorithms)

The same three calls — add, filter, search — exist in all of them, e.g. Chroma:

import chromadb
client = chromadb.Client()
col = client.create_collection("docs")
col.add(ids=["d1"], documents=["dogs are loyal pets"], metadatas=[{"src": "pets.md"}])
hits = col.query(query_texts=["faithful companion animals"], n_results=2)

Notice Chroma even calls the embedding model for you (documents= instead of raw vectors) — convenience over our explicit embed(), same mechanics underneath.

The takeaway

A vector database stores embeddings and answers "what's most similar to this?" in one matmul-plus-top-k — which you just built in NumPy. Production tools (Chroma, Qdrant, pgvector, Pinecone) add approximate-nearest-neighbor indexes (the HNSW/IVF-PQ engines), persistence, and metadata filtering so it scales to millions. Embeddings come from a model; the store finds neighbors. This is the retrieval half of RAG — now let's wire it to an LLM and answer questions. 👉