Code and structure-aware context

Chapter 3 shrank prose by scoring sentences and keeping the top ones. Code does not work that way. You cannot drop "the least important line" of a function and hope the rest still makes sense, and you cannot rank lines by similarity to a question the way you would rank paragraphs. Code has structure: functions call functions, files import files, classes inherit classes. The right way to compress a codebase for a model is to use that structure to select only the parts the task touches, and leave the rest out entirely.

This chapter is about selecting code by structure instead of by file. The lazy habit, when a model needs to reason about one function, is to paste the whole file (or the whole repo) and let attention sort it out. That spends tokens on functions the task never touches, and on a real repository it does not even fit. The structure-aware habit is to parse the code, figure out what the target function actually depends on, and send only that.

The unit of code is not the line

To talk about structure we need the thing that exposes it. When Python (or any language) reads source text, the first step is to turn that flat string into a tree that mirrors the grammar of the language. That tree is the Abstract Syntax Tree (AST). "Abstract" because it throws away surface details like whitespace and keeps the meaningful shape: this is a function definition, its body is a list of statements, this statement is a call, the thing being called has this name. Python hands you the AST through the standard-library ast module, so we can work with the structure of code using nothing but the standard library.

Two node types carry most of what we need:

A FunctionDef node is one def. It knows the function's name and holds its body as a list of child nodes.
A Call node is one function call, like summarize(rows). Its .func child tells you what is being called; when the call is a plain name, that child is a Name node whose .id is the string "summarize".

So "which functions does this function call" is not a text search for parentheses. It is a walk over the Call nodes inside a FunctionDef, reading the name off each one. The AST makes that exact and immune to the things that fool a regex (a call inside a comment, a string that happens to contain foo(, a name that is a substring of another).

The call graph

Once you can list every function and, for each one, the functions it calls, you have a call graph: a set of nodes (the functions) and directed edges (A calls B means an arrow from A to B). The call graph is the map of how the code's pieces depend on each other, and it is what lets you answer the question that matters for context: if I want the model to understand make_report, what else does it need to see?

The answer is the transitive dependencies of the target. "Transitive" means you follow the arrows all the way, not just one hop. If make_report calls clean, and clean called something else, you would need that too, and so on until nothing new turns up. Formally, the transitive dependencies of a target $t$ are every node reachable from $t$ by following call edges:

$$\text{deps}(t) = {, t ,} \cup {, v : t \rightsquigarrow v ,}$$

where $t \rightsquigarrow v$ means "there is a path of call edges from $t$ to $v$." You compute this with an ordinary graph search. Start with the target, look at everything it calls, add anything you have not seen, and repeat until the frontier is empty. That is a breadth-first search (BFS), and it is the whole selection algorithm: the set it returns is exactly the slice of code the model needs, and everything outside the set is noise you can drop.

The demo

The script below carries a small Python module as a string: eight functions where the target make_report transitively needs only three of them (load_rows, clean, summarize) and the other four (send_email, connect_smtp, resize_image, slugify) are unrelated. It parses the module with ast, records each function's exact source text, builds the call graph by walking Call nodes, runs a BFS from the target to collect its transitive dependencies, and emits only those functions as the context slice. Then it prints the before/after token counts (the same words * 1.3 estimate from Chapter 2) and verifies that the target and all its real dependencies are present while the unrelated functions are gone.

"""Select code by STRUCTURE (a call graph), not by file.

When an LLM needs to reason about one function, the lazy move is to paste the
whole module and let the model sort it out. That spends tokens on functions the
task never touches. The structure-aware move is to parse the code, find which
functions the target actually CALLS (directly or indirectly), and send only
those. This script does exactly that with the standard-library `ast` module.

Pipeline:
  1. parse the module text into an Abstract Syntax Tree (AST)
  2. record each top-level function's name and exact source text
  3. build a CALL GRAPH: for each function, which other functions it calls
  4. from the target, walk the graph to collect its transitive dependencies
  5. emit only those functions as the "context slice"

Standard library only. Run:  python3 code_context.py
"""

import ast


def est_tokens(text: str) -> int:
    """Rough token estimate, same rule as chapter 2: ~1.3 tokens per whitespace
    word. Code fragments more than prose, so this UNDER-counts real code tokens,
    but it is consistent across the before and after, which is what we compare."""
    return round(len(text.split()) * 1.3)


# A small module. `make_report` (the target) transitively needs `load_rows`,
# `clean`, and `summarize`. The other four functions are unrelated noise that a
# whole-file dump would carry along for nothing.
MODULE = '''\
def load_rows(path):
    raw = path.read_text()
    return [line.split(",") for line in raw.splitlines()]


def clean(rows):
    return [[cell.strip() for cell in row] for row in rows]


def summarize(rows):
    total = sum(float(row[1]) for row in rows)
    return {"count": len(rows), "total": total}


def make_report(path):
    rows = clean(load_rows(path))
    stats = summarize(rows)
    return f"{stats['count']} rows, total {stats['total']}"


def send_email(addr, body):
    server = connect_smtp()
    message = build_message(addr, body)
    receipt = server.deliver(addr, message)
    log_delivery(addr, receipt)
    return receipt


def connect_smtp():
    host = read_config("smtp_host")
    port = read_config("smtp_port")
    return open_socket(host, port)


def resize_image(img, width, height):
    ratio = min(width / img.width, height / img.height)
    new_w = int(img.width * ratio)
    new_h = int(img.height * ratio)
    canvas = blank_canvas(width, height)
    scaled = img.scaled(new_w, new_h)
    return paste_centered(canvas, scaled)


def slugify(title):
    lowered = title.lower().strip()
    safe = "".join(ch for ch in lowered if ch.isalnum() or ch == " ")
    collapsed = " ".join(safe.split())
    return collapsed.replace(" ", "-")
'''


def function_table(module_src):
    """Parse the module and return {name: source_text} for every top-level
    function. `ast.parse` turns the text into a tree of nodes; a `FunctionDef`
    node is one `def`. `ast.get_source_segment` hands back the EXACT slice of
    the original text for that node, so we keep formatting and comments."""
    tree = ast.parse(module_src)
    funcs = {}
    for node in tree.body:
        if isinstance(node, ast.FunctionDef):
            funcs[node.name] = ast.get_source_segment(module_src, node)
    return funcs


def build_call_graph(module_src, names):
    """Return {name: set_of_names_it_calls}. We walk each function's subtree
    looking for `Call` nodes. A call like `summarize(rows)` parses as a `Call`
    whose `.func` is a `Name` node with `id == "summarize"`. We only keep callees
    that are themselves functions defined in this module (ignore builtins like
    `sum`, `len`, `float`, and methods like `.strip()`)."""
    tree = ast.parse(module_src)
    graph = {name: set() for name in names}
    for node in tree.body:
        if isinstance(node, ast.FunctionDef):
            for sub in ast.walk(node):
                if isinstance(sub, ast.Call) and isinstance(sub.func, ast.Name):
                    callee = sub.func.id
                    if callee in names:
                        graph[node.name].add(callee)
    return graph


def transitive_deps(graph, target):
    """Breadth-first walk of the call graph from `target`. Start with the target,
    then repeatedly add everything its current members call, until nothing new
    appears. The result is the target plus every function reachable from it: its
    transitive dependencies. The order is the discovery order (target first)."""
    seen = []
    queue = [target]
    in_set = {target}
    while queue:
        name = queue.pop(0)
        seen.append(name)
        for callee in sorted(graph[name]):
            if callee not in in_set:
                in_set.add(callee)
                queue.append(callee)
    return seen


def main():
    funcs = function_table(MODULE)
    names = set(funcs)
    graph = build_call_graph(MODULE, names)
    target = "make_report"

    needed = transitive_deps(graph, target)
    dropped = [n for n in funcs if n not in needed]

    slice_src = "\n\n\n".join(funcs[n] for n in funcs if n in needed)

    full_tok = est_tokens(MODULE)
    slice_tok = est_tokens(slice_src)

    print("=== The call graph (who calls whom) ===")
    for name in funcs:
        calls = sorted(graph[name])
        print(f"  {name:14s} -> {', '.join(calls) if calls else '(no in-module calls)'}")
    print()

    print(f"=== Slice for target: {target}() ===")
    print(f"  included ({len(needed)}): {', '.join(needed)}")
    print(f"  dropped  ({len(dropped)}): {', '.join(dropped)}")
    print()

    print("=== Tokens: whole module vs structure-aware slice ===")
    print(f"  whole module : {full_tok:4d} tokens ({len(funcs)} functions)")
    print(f"  slice        : {slice_tok:4d} tokens ({len(needed)} functions)")
    reduction = 100 * (full_tok - slice_tok) / full_tok
    print(f"  reduction    : {reduction:.0f}% fewer tokens sent")
    print()

    print("=== Verify the slice is correct ===")
    expected_deps = {"load_rows", "clean", "summarize"}
    target_present = target in needed
    deps_present = expected_deps.issubset(set(needed))
    unrelated = {"send_email", "connect_smtp", "resize_image", "slugify"}
    unrelated_absent = unrelated.isdisjoint(set(needed))
    print(f"  target present              : {target_present}")
    print(f"  all real dependencies present: {deps_present}  {sorted(expected_deps)}")
    print(f"  unrelated functions absent   : {unrelated_absent}  {sorted(unrelated)}")
    print(f"  ALL CHECKS PASS              : {target_present and deps_present and unrelated_absent}")


if __name__ == "__main__":
    main()

Running it:

=== The call graph (who calls whom) ===
  load_rows      -> (no in-module calls)
  clean          -> (no in-module calls)
  summarize      -> (no in-module calls)
  make_report    -> clean, load_rows, summarize
  send_email     -> connect_smtp
  connect_smtp   -> (no in-module calls)
  resize_image   -> (no in-module calls)
  slugify        -> (no in-module calls)

=== Slice for target: make_report() ===
  included (4): make_report, clean, load_rows, summarize
  dropped  (4): send_email, connect_smtp, resize_image, slugify

=== Tokens: whole module vs structure-aware slice ===
  whole module :  181 tokens (8 functions)
  slice        :   65 tokens (4 functions)
  reduction    : 64% fewer tokens sent

=== Verify the slice is correct ===
  target present              : True
  all real dependencies present: True  ['clean', 'load_rows', 'summarize']
  unrelated functions absent   : True  ['connect_smtp', 'resize_image', 'send_email', 'slugify']
  ALL CHECKS PASS              : True

Read the call graph first. The only function with in-module callees is make_report, which points at clean, load_rows, and summarize. There is a second little cluster, send_email calling connect_smtp, but no arrow connects it to the target, so the search from make_report never reaches it. That is why the slice contains four functions and drops the other four, and why the token count falls from 181 to 65, a 64% cut, while the verification block confirms nothing the target needs went missing and nothing it does not need came along.

Notice what the selection did not rely on. It did not look at what the functions are named, what they appear to be about, or whether their text resembles some query. slugify and resize_image were dropped not because they seemed irrelevant but because no path of call edges reaches them from the target. The structure decided, and the structure cannot be fooled by a suggestive name.

Don't be confused. Structural relevance and semantic relevance are different filters, and they answer different questions. The retrieval in Chapter 3 and the memory in Chapter 9 use semantic relevance: embed the query and the candidates, and keep the ones whose meanings are close. That finds code that is about the same topic. The call graph here uses structural relevance: keep the code the target actually executes. Those can disagree. A helper named _x7 that make_report calls is structurally essential and semantically invisible; a beautifully named generate_report_pdf that nothing calls is semantically tempting and structurally useless. For "give the model what it needs to understand this function," structure is the correct filter, because correctness depends on what the code calls, not on what it sounds like.

Where this is used in the wild

The toy here is a single-file call graph, but the idea scales to real tools that select code by structure instead of dumping it.

CodeCompressor is code-aware prompt compression: instead of treating a file as a bag of lines, it respects code structure when deciding what to cut, so the compressed prompt is still parseable code rather than a shredded fragment.
lean-ctx builds context from a code graph, assembling the relevant slice for a task from the dependency structure rather than pasting whole files.
tree-sitter based selectors use the tree-sitter parser, which produces an AST for dozens of languages, to pull out specific definitions (this function, this class) instead of whole files. It is the same FunctionDef-and-Call idea generalized past Python.
Aider's "repo map" gives a coding agent a compact, structure-derived index of a repository: the symbols (functions, classes) and how they relate, ranked by relevance to the current task, so the model gets a map of the codebase rather than its full text.

The common thread is that every one of these sends the model a selected, structured slice rather than raw bulk. The selection is what keeps the context lean (the second property from Chapter 1) on inputs far too large to send whole.

'with Claude' note (illustrative). A coding agent like Claude Code does not paste the repository into the prompt. It navigates: it greps for a symbol, reads the specific files a task names, follows a call or import to the next file, and stops when it has what it needs. The effect is the same selection this chapter computes, done incrementally as the task reveals which symbols matter, which keeps the window lean even on a large codebase.

The use cases line up with how people actually deploy this. A coding agent editing one function needs that function and its dependencies, not the repo. Repository question-answering ("where do we validate the JWT?") wants the few functions on the path to the answer. PR review wants the changed functions plus what they call and what calls them, which is the call graph extended one hop in the other direction, so the reviewer model sees the blast radius of the change and nothing else.

Using the real tool: commands and before/after proof

The demo above computed the slice by hand. The two tools below do the same selection on a real repository, with commands you can run. Both replace the lazy "paste the whole file" habit with "send only the parts the task touches."

Start with Claude Code, because it is the clearest example of the principle. When you run it against a repository in non-interactive mode, you give it one instruction and let it find the code:

# In the root of a repo. -p runs it once and prints the result (no chat session).
claude -p "fix the bug in the deploy function"

What happens next is the whole point. Claude Code does not read the repository into the prompt and ask the model to find deploy in the wall of text. It uses its own search and read tools: it greps for the symbol deploy to locate the file and line, reads just that function, follows a call or an import to the next function it needs, and stops when it has enough. The model only ever sees the handful of functions on the path to the fix, which is the same slice the from-scratch demo selected with a BFS over the call graph, except here it is gathered incrementally as the task reveals which symbols matter.

Contrast that with the naive approach: open deploy.py (or worse, cat the whole src/ directory) and paste all of it into the prompt. On a small file that wastes tokens on functions the bug never touches; on a real repository it does not fit in the context window at all. The search-and-read habit is what keeps the window lean on inputs too large to send whole.

The second tool is Aider, an open-source command-line coding assistant. It builds a repo map: a compact index of the repository's definitions (functions, classes, and their call signatures) rather than their full bodies, ranked by relevance to the current task with a graph algorithm over the file dependency structure. Under the hood the map is built with tree-sitter, the same parser the "wild" section mentioned, which gives Aider an AST for dozens of languages instead of the single-file Python ast of our demo.

pip install aider-chat
aider                       # builds a repo map of the relevant definitions, not whole files
aider --map-tokens 1024     # bound the map to roughly 1024 tokens

The --map-tokens option is the lever that matters here. It caps how many tokens the repo map may occupy, so Aider keeps the highest-ranked definitions and drops the rest to fit the budget. That is the same trade the call-graph BFS makes (keep what the task needs, drop the rest), spent as an explicit token budget instead of a reachability set.

Now prove the win. The metric is input tokens, and the exact count comes from client.messages.count_tokens, not a word-count estimate (the Anthropic SDK; the model id this book uses is claude-opus-4-8). Count the prompt twice, once with the whole file or repo dumped in and once with just the targeted slice (the function plus its dependencies, exactly what the from-scratch demo selected), and read .input_tokens off each result:

from anthropic import Anthropic

client = Anthropic()

def input_tokens(code: str) -> int:
    resp = client.messages.count_tokens(
        model="claude-opus-4-8",
        messages=[{"role": "user", "content": code}],
    )
    return resp.input_tokens

whole = input_tokens(open("module.py").read())   # everything
slice_ = input_tokens(selected_slice_text)        # target + its deps only
print(whole, slice_, f"{1 - slice_/whole:.0%} fewer")

whole module ~500 tokens -> structure-aware slice ~150 tokens (70% fewer)

Those two numbers are illustrative: the shapes to expect, not a measurement from this box (count_tokens needs a network call and an API key, which the verification setup does not have). The measurement we did run is the AST demo above, which cut its module from 181 tokens to 65 (a 64% reduction) and verified that nothing the target calls went missing. That on-box result is the proof; count_tokens is how you reproduce the same before/after on a real file or repo, with exact counts instead of an estimate.

Takeaways

Code has structure (calls, imports, inheritance) that prose does not, so you compress it by selecting whole units along that structure, not by ranking or trimming lines.
The AST turns source text into a tree you can query precisely. A FunctionDef is a def; a Call node's name tells you what it invokes. The standard-library ast module is all you need for Python.
A call graph plus a graph search from the target yields its transitive dependencies: the exact slice the model needs. The demo cut a module from 181 to 65 tokens (64%) while keeping every function the target actually calls.
Structural relevance (what the code calls) and semantic relevance (what the code is about) are different filters. For "what does this function need to run," structure is the correct one; a suggestive name is not evidence and an ugly name is not disqualifying.
Real tools do this at scale: CodeCompressor, lean-ctx, tree-sitter selectors, and Aider's repo map all send a structured slice, and coding agents navigate to targeted files instead of pasting the repo.

👉 Selecting a lean context is half the battle. The other half is not paying for the same tokens twice when you send a similar context call after call. The next chapter is the KV-cache and prefix caching: how the model's internal work on a stable prefix can be reused across calls, and how to order your context so it is.

Context Engineering in Depth