How to read this book

This book assumes you can read Python and have seen an LLM API call before. It does not assume you know what a KV-cache, an embedding, attention FLOPs, or a knowledge graph are; each is built up from nothing the first time it appears. If a term looks unfamiliar, it is explained in the chapter that needs it, or in the glossary at the end of the references.

Run the code

Every from-scratch demo uses only the Python standard library and NumPy. Nothing else is required: no PyTorch, no API key, no vector database, no cloud account. If you have Python 3 and NumPy, you can run all of it.

pip install numpy
python3 books/context-engineering/code/<file>.py

Each chapter includes its demo source directly from the code/ folder and then shows a text block with the exact output that source produced when it was run. The numbers in the prose (a compression ratio, a cache hit rate, a FLOP count) come from that output, so you can reproduce them and change them.

Don't be confused. Two kinds of code appear in this book and they are not the same. The NumPy demos are run on a real machine and their output is pasted in verbatim, so you can trust the numbers. The Anthropic API snippets (prompt caching, token counting, output control) are written as follow-along: correct, copyable, but not executed here, because the build machine has no API key. Wherever you see an API snippet, treat its result as illustrative and run it yourself against your own key.

The house conventions

A few markers recur:

  • A 👉 arrow ends every chapter with a one-line hand-off to the next. It is a deliberate signpost, not filler.
  • A "Don't be confused" box, like the ones on this page, pulls apart two ideas that are easy to mix up (prompt vs context engineering, caching the prompt vs caching the answer, deleting context vs summarizing it).
  • A "Takeaways" list near the end of each chapter is the five-second version, for when you come back later and need the gist without rereading.

The order

The Foundations chapters define the field and the token economy that makes every later trade-off concrete; read them first. After that the four families (compression, caching, memory, architecture) are mostly independent, so you can jump to the lever you need. The landscape chapter at the end is a map from each technique to the open-source project that implements it, useful as a reference when you are choosing a tool rather than building your own.

Cross-links like Chapter 6 point you to the chapter that goes deeper on something mentioned in passing. Follow them when you want the detail; skip them when you are skimming.

👉 With the ground rules set, let us define the thing this whole book is about.