Vector databases
Modern AI runs on embeddings — vectors where geometric closeness means semantic similarity. Searching millions of them fast, with metadata and persistence, is the job of a vector database, and it's the storage layer under semantic search, recommendations, and RAG (next chapter). We build the core in NumPy so the magic is obvious, then map it to the real tools.
Setup: the in-memory store runs on NumPy only and the output below is real. Production uses Qdrant / Chroma / pgvector / Pinecone.
What a vector database actually does
Three operations, and the third is the whole point:
- Store vectors, each with an
idand metadata (the source text, a URL, a timestamp). - Filter by metadata (
source = "docs",date > last_week). - Search: given a query vector, return the top-k most similar stored vectors — fast, even over millions.
The similarity is cosine (or dot product on normalized vectors) — exactly the recipe
from the foundations book. Here's the entire core, from
code/rag/vectorstore.py:
import numpy as np
class VectorStore:
def __init__(self):
self.ids, self.vecs, self.meta = [], [], []
def add(self, id, vector, metadata=None):
self.ids.append(id); self.vecs.append(np.asarray(vector, float))
self.meta.append(metadata or {})
def search(self, query_vec, k=3):
M = np.vstack(self.vecs)
sims = M @ np.asarray(query_vec, float) # all-pairs cosine in one matmul
order = np.argsort(-sims)[:k] # top-k, best first
return [{"id": self.ids[i], "score": round(float(sims[i]), 3),
"metadata": self.meta[i]} for i in order]
That M @ query_vec — one matrix-vector product scoring every stored vector at once
— is vector search. Everything a real vector DB adds is performance and durability
around this idea.
Embeddings: where the vectors come from
A vector store is only as good as its embeddings. You don't compute these by hand — you call an embedding model that maps text to a learned vector. For a runnable, deterministic demo we use a tiny interpretable "topic embedder" (3 dimensions: pets / finance / tech); in production you swap in a real model (Voyage AI, OpenAI, or a local sentence-transformer) that produces 768–1536 dimensions of learned meaning.
$ python rag/vectorstore.py
Output:
query 'loyal pets to own' -> [{'id': 'd1', 'score': 1.0, 'metadata': {'src': 'pets.md'}}, {'id': 'd2', 'score': 0.0, 'metadata': {'src': 'news.md'}}]
query 'python programming' -> [{'id': 'd3', 'score': 1.0, 'metadata': {'src': 'tech.md'}}, {'id': 'd1', 'score': 0.0, 'metadata': {'src': 'pets.md'}}]
The pets query retrieved the pets document (score 1.0) and correctly ranked the finance doc last (0.0); the programming query found the tech doc. Same matmul, real ranking — with learned embeddings instead of our toy ones, "loyal pets" would also match "faithful companion animals" with no shared words. That semantic matching is why vector search beats keyword search.
Why you need a real vector database
Our NumPy store works for thousands of vectors. It falls over at scale, and that's exactly the gap the tools fill:
| Our store | A vector database adds |
|---|---|
M @ q scans every vector (O(n)) | ANN indexes (HNSW, IVF-PQ) → sub-linear search over millions |
| Lives in RAM, lost on restart | Persistence to disk |
| No filtering | Metadata filters combined with vector search |
| One process | Scaling, sharding, replication |
Don't be confused: exact vs. approximate search. Our
argsortdoes exact nearest-neighbor — perfect results, but O(n) per query. Vector databases use approximate nearest-neighbor (ANN) indexes that trade a tiny bit of recall for massive speed, finding the top-k in roughly O(log n). For millions of vectors, approximate is the only option — and the HNSW and IVF-PQ sister books build those exact indexes from scratch. A vector DB is essentially ourVectorStorewith an ANN index, persistence, and filtering bolted on.
The landscape
| Tool | What it is |
|---|---|
| Chroma | dead-simple, embedded; great for prototypes and local RAG |
| Qdrant | fast Rust engine, rich filtering; popular self-hosted choice |
| pgvector | a Postgres extension — vectors in your existing database |
| Pinecone / Weaviate / Milvus | managed/scalable vector DBs for production |
| FAISS | Meta's library — the index, no server (the HNSW/IVF-PQ algorithms) |
The same three calls — add, filter, search — exist in all of them, e.g. Chroma:
import chromadb
client = chromadb.Client()
col = client.create_collection("docs")
col.add(ids=["d1"], documents=["dogs are loyal pets"], metadatas=[{"src": "pets.md"}])
hits = col.query(query_texts=["faithful companion animals"], n_results=2)
Notice Chroma even calls the embedding model for you (documents= instead of raw
vectors) — convenience over our explicit embed(), same mechanics underneath.
The takeaway
A vector database stores embeddings and answers "what's most similar to this?" in one matmul-plus-top-k — which you just built in NumPy. Production tools (Chroma, Qdrant, pgvector, Pinecone) add approximate-nearest-neighbor indexes (the HNSW/IVF-PQ engines), persistence, and metadata filtering so it scales to millions. Embeddings come from a model; the store finds neighbors. This is the retrieval half of RAG — now let's wire it to an LLM and answer questions. 👉