Temporal knowledge graphs

Chapter 9 gave the agent a memory: a store of facts outside the window that it retrieves the relevant slice of and re-injects each turn. That store answers "what do we know about Acme?" by returning the most similar fact. It cannot answer "who was Acme's CTO in Q1?", because a plain fact has no notion of when it was true. A vector memory returns the closest match; it has no clock.

This chapter adds the clock. We attach a validity interval to every fact, so the store records not just what is true but for which span of time it was true. That one addition turns "the CTO is Mei" (which silently goes stale the moment Mei leaves) into "Mei was CTO from Q4 onward, Ravi before that, Dana before that," a record that stays correct as the world changes.

A fact is a triple, and now it carries time

A triple is the smallest unit of a knowledge graph: three parts, written (subject, predicate, object).

  • The subject is the thing the fact is about: Acme.
  • The predicate is the relation or attribute: CTO.
  • The object is the value: Dana.

Read it left to right: "Acme's CTO is Dana." Triples are the edges of a graph (subject and object are nodes, the predicate is the labeled arrow between them), which is why a store of them is a knowledge graph. So far this is just a fact, the same thing Chapter 9 stored as text.

The temporal part is one more piece: a validity interval, the span of time over which the fact held. We write it as a half-open interval

$$[t_0, t_1)$$

meaning the fact is true from $t_0$ inclusive up to $t_1$ exclusive. The square bracket includes the endpoint; the round bracket excludes it. Half-open is the right choice because it makes successive facts tile time with no gap and no overlap: the instant one fact ends is exactly the instant the next begins. If Dana is CTO over $[Q1, Q2)$ and Ravi over $[Q2, Q4)$, then at $Q2$ there is exactly one answer (Ravi), not zero and not two. When $t_1$ is unknown because the fact is still true, we leave it open ($t_1 = \text{None}$), which means "true from $t_0$ onward, no known end."

Don't be confused. A fact's value and a fact's validity are two different things, and the whole chapter rests on keeping them apart. The value is the object: Dana, Ravi, Mei. The validity is the interval the value held: [Q1, Q2). A plain key-value store keeps only the current value and throws the interval away, which is exactly why it cannot answer questions about the past. The temporal store keeps both, so the value Dana does not vanish when it stops being current; it just gets an end stamped on its interval.

Two clocks: valid time and ingestion time

There are two different times you might mean when you say "when." Keeping both is what makes a store bi-temporal (two clocks).

  • Valid time is when the fact was true in the world. Mei became CTO at the start of Q4; her valid time starts at Q4.
  • Ingestion time (also called transaction time) is when your system learned the fact. You might not record Mei's appointment until a week later, or discover a back-dated change during an audit months on.

These two clocks come apart constantly. A price changed on Monday but your pipeline ingested it on Wednesday. A role changed in March but you only heard about it in May. With one clock you can answer "what was true at time $T$." With both clocks you can also answer "what did we believe at time $T$," which is the question every audit and every "why did the agent do that?" investigation actually asks. The valid-time interval lives on the edge as [valid_from, valid_to); the ingestion time is a separate stamp, ingested_on.

Supersession: close, do not delete

Here is the move that makes the whole thing work. When a new fact replaces an old one about the same (subject, predicate), for example the CTO changes from Dana to Ravi, the naive instinct is to overwrite: point (Acme, CTO) at Ravi and move on. That destroys history.

Instead we supersede. We close the old edge by setting its valid_to to the change time, and we open a new edge starting at that same time. Nothing is deleted. Dana's edge becomes [Q1, Q2) (closed), Ravi's edge becomes [Q2, None) (open, current). Because the new fact's start equals the old fact's end and the intervals are half-open, the timeline stays gapless and overlap-free. Run this forward through several changes and you get a complete succession, every past value still queryable.

The demo: a bi-temporal triple store from scratch

The code below builds two stores from the same stream of events so the contrast is fair: a naive KeyValueStore (a dict from key to current value, overwrite on every change) and a TemporalKG (an append-only list of edges, close-and-open on every change). It feeds Acme's year through both, prints the timeline the temporal store preserved, and then asks the question that separates them: "who was the CTO in Q1?" Everything is standard library, no imports beyond dataclasses, datetime, and typing.

"""A bi-temporal triple store from scratch: track WHEN a fact was true, not
just what it currently is.

Chapter 9 stored facts as text and retrieved the most SIMILAR one. That answers
"what do we know about X" but has no notion of WHEN a fact held. This chapter
adds time. A fact here is a TRIPLE plus a validity window:

    (subject, predicate, object)  valid over the half-open interval [t0, t1)

  - subject:   the thing the fact is about        ("Acme")
  - predicate: the relation / attribute            ("CTO")
  - object:    the value                           ("Dana")
  - [t0, t1):  the fact is true from t0 up to but NOT including t1.
               t1 = None means "still true, no known end" (an open interval).

Two clocks, hence "bi-temporal":
  - valid time:     when the fact was true IN THE WORLD (the CTO started in Q1).
  - ingestion time: when OUR SYSTEM learned it (we recorded it on some date).
These differ all the time: a role may have changed in March but we only hear
about it in May. Tracking both lets us answer "what was true at T" AND "what did
we BELIEVE at T", which is what audits need.

The key move: when a new fact SUPERSEDES an old one about the same
(subject, predicate), we do NOT overwrite. We CLOSE the old edge (set its valid_to
to the change time) and OPEN a new edge. The old value is still there, just no
longer current. History is preserved, so point-in-time queries stay correct.

We prove it by contrast with a naive key-value store that keeps only the latest
value: it answers "who is the CTO NOW" fine but gives the WRONG answer to
"who was the CTO in Q1", because it overwrote and forgot.

Standard library only. Run:  python3 temporal_kg.py
"""

from dataclasses import dataclass
from datetime import date
from typing import Optional


# --- A point in time -----------------------------------------------------------
# We use plain dates so the demo reads like a calendar. Quarters of 2024:
Q1 = date(2024, 1, 1)
Q2 = date(2024, 4, 1)
Q3 = date(2024, 7, 1)
Q4 = date(2024, 10, 1)

# A far-future sentinel only used for printing an open interval as a width.
OPEN = "now"


# --- 1. The naive baseline: a plain key-value store ---------------------------
class KeyValueStore:
    """The thing most systems actually do: a dict from key to CURRENT value.

    Writing a new value OVERWRITES the old one. There is exactly one slot per
    key, so the moment a fact changes, the previous value is gone. This is fine
    for "what is true now" and silently wrong for any question about the past.
    """

    def __init__(self):
        self.data = {}  # (subject, predicate) -> object

    def set(self, subject, predicate, obj):
        self.data[(subject, predicate)] = obj  # clobbers whatever was there

    def get(self, subject, predicate):
        return self.data.get((subject, predicate))


# --- 2. The temporal edge ------------------------------------------------------
@dataclass
class Edge:
    """One fact with its validity window and the time we learned it.

    valid_from / valid_to are the VALID-TIME interval [valid_from, valid_to):
    when the fact held in the world. valid_to = None means open (still true).
    ingested_on is the INGESTION-TIME stamp: when this row entered our store.
    """
    subject: str
    predicate: str
    obj: str
    valid_from: date
    valid_to: Optional[date]   # None = open interval, still valid
    ingested_on: date

    def is_open(self):
        return self.valid_to is None

    def valid_at(self, t):
        """Is this edge true at instant t? Half-open: t in [valid_from, valid_to).

        Half-open means valid_from is included and valid_to is excluded, so the
        instant a fact ends is exactly the instant its successor begins, with no
        overlap and no gap. An open edge (valid_to is None) is true for every t
        at or after valid_from.
        """
        if t < self.valid_from:
            return False
        if self.valid_to is None:
            return True
        return t < self.valid_to


# --- 3. The bi-temporal triple store ------------------------------------------
class TemporalKG:
    """An append-only list of edges. Supersession closes the old edge instead of
    deleting it, so every past value survives for point-in-time queries."""

    def __init__(self):
        self.edges = []  # all edges ever, in insertion order

    def add(self, subject, predicate, obj, valid_from, ingested_on=None):
        """Record that (subject, predicate) became `obj` at valid_from.

        If an OPEN edge already exists for this (subject, predicate), it is
        superseded: we CLOSE it by setting its valid_to to valid_from (the new
        fact's start = the old fact's end, no overlap), then append the new open
        edge. We never mutate the object of an existing edge and never delete.
        """
        if ingested_on is None:
            ingested_on = valid_from  # default: learned it when it happened
        # Close any currently-open edge for this same key.
        for e in self.edges:
            if (e.subject == subject and e.predicate == predicate
                    and e.is_open()):
                e.valid_to = valid_from  # CLOSE, do not delete
        self.edges.append(Edge(subject, predicate, obj,
                               valid_from, None, ingested_on))

    def current(self, subject, predicate):
        """What is true NOW: the one open edge for this key (valid_to is None)."""
        for e in self.edges:
            if (e.subject == subject and e.predicate == predicate
                    and e.is_open()):
                return e.obj
        return None

    def as_of(self, subject, predicate, t):
        """Point-in-time query: what was true at instant t.

        A "point-in-time query" asks the store to rewind: given a past instant t,
        return the value whose validity window contained t. We scan for the edge
        whose [valid_from, valid_to) interval covers t.
        """
        for e in self.edges:
            if (e.subject == subject and e.predicate == predicate
                    and e.valid_at(t)):
                return e.obj
        return None

    def history(self, subject, predicate):
        """All edges for one key, oldest first, for printing the timeline."""
        rows = [e for e in self.edges
                if e.subject == subject and e.predicate == predicate]
        return sorted(rows, key=lambda e: e.valid_from)


# --- Helpers for pretty timelines ---------------------------------------------
QNAME = {Q1: "Q1", Q2: "Q2", Q3: "Q3", Q4: "Q4"}

def qname(d):
    return QNAME.get(d, str(d))

def interval_str(e):
    end = qname(e.valid_to) if e.valid_to is not None else OPEN
    return f"[{qname(e.valid_from)}, {end})"


# --- The scenario --------------------------------------------------------------
# A small company "Acme" whose facts change over the year. We feed events in the
# order they happened, each one superseding the previous value for that key.
print("=== Scenario: Acme's facts change over 2024 ===")
print("Events, in the order they occurred:\n")

events = [
    # (subject, predicate, object, when it became true)
    ("Acme", "CTO",   "Dana",  Q1),   # Dana is CTO from Q1
    ("Acme", "plan",  "Free",  Q1),   # on the Free plan from Q1
    ("Acme", "CTO",   "Ravi",  Q2),   # Ravi replaces Dana in Q2  (supersession 1)
    ("Acme", "plan",  "Pro",   Q3),   # upgrades to Pro in Q3     (supersession 2)
    ("Acme", "CTO",   "Mei",   Q4),   # Mei replaces Ravi in Q4   (supersession 3)
]

# Build BOTH stores from the same events so the comparison is apples to apples.
kv = KeyValueStore()
kg = TemporalKG()
for subject, predicate, obj, when in events:
    print(f"  {qname(when)}: {subject}.{predicate} := {obj}")
    kv.set(subject, predicate, obj)        # naive: overwrite
    kg.add(subject, predicate, obj, when)  # temporal: close + open

print()
print("The key-value store kept only the LAST write for each key.")
print("The temporal KG kept every edge, closing each as the next one opened.\n")


# --- The timeline the temporal KG preserved -----------------------------------
print("=== Timeline preserved by the temporal KG ===")
for key in [("Acme", "CTO"), ("Acme", "plan")]:
    subject, predicate = key
    print(f"  {subject}.{predicate}:")
    for e in kg.history(subject, predicate):
        status = "current" if e.is_open() else "closed"
        print(f"    {interval_str(e):>14}  = {e.obj:<5} ({status})")
print()


# --- BEFORE vs AFTER: who was the CTO in Q1? ----------------------------------
print("=== Query A: who is the CTO NOW? (both stores should agree) ===")
print(f"  key-value store : {kv.get('Acme', 'CTO')}")
print(f"  temporal KG     : {kg.current('Acme', 'CTO')}")
print("  -> agree: the current value is the easy case.\n")

print("=== Query B: who was the CTO in Q1? (point-in-time) ===")
print(f"  key-value store : {kv.get('Acme', 'CTO')}   <- WRONG")
print("    it only stores the latest value; it overwrote Dana and Ravi and")
print("    cannot answer about the past. It returns today's CTO for every date.")
print(f"  temporal KG     : {kg.as_of('Acme', 'CTO', Q1)}   <- correct")
print("    it scans for the edge whose [valid_from, valid_to) interval covers Q1.\n")


# --- A full point-in-time sweep -----------------------------------------------
print("=== Query C: walk the CTO forward, quarter by quarter ===")
for t in [Q1, Q2, Q3, Q4]:
    kv_ans = kv.get("Acme", "CTO")            # same wrong answer every time
    kg_ans = kg.as_of("Acme", "CTO", t)       # the right answer for each date
    print(f"  as of {qname(t)}:  key-value={kv_ans:<5}  temporal-KG={kg_ans}")
print("  the key-value column is frozen at the latest value; the temporal-KG")
print("  column tracks the real succession Dana -> Ravi -> Ravi -> Mei.\n")


# --- The bi-temporal twist: knew-late ------------------------------------------
# Now the second clock earns its keep. Suppose Acme's plan actually changed back
# to Free at the start of Q4, but we did not LEARN this until a later audit on
# 2024-12-15. Valid time and ingestion time diverge.
print("=== Query D: bi-temporal, valid time vs ingestion time ===")
kg.add("Acme", "plan", "Free", Q4, ingested_on=date(2024, 12, 15))
plan_edges = kg.history("Acme", "plan")
print("  plan timeline (with ingestion stamps):")
for e in plan_edges:
    print(f"    {interval_str(e):>14}  = {e.obj:<5}  learned_on {e.ingested_on}")
print()
print(f"  valid-time answer 'what WAS the plan at Q4?':      "
      f"{kg.as_of('Acme', 'plan', Q4)}")
print("    (Free: the change took effect at Q4 in the world.)")
print("  ingestion-time fact 'when did we LEARN the Q4 plan?': 2024-12-15")
print("    (the edge was valid from Q4 but only entered the store in December.)")
print("  An audit needs both: what was true, and when we knew it.")

Running it:

=== Scenario: Acme's facts change over 2024 ===
Events, in the order they occurred:

  Q1: Acme.CTO := Dana
  Q1: Acme.plan := Free
  Q2: Acme.CTO := Ravi
  Q3: Acme.plan := Pro
  Q4: Acme.CTO := Mei

The key-value store kept only the LAST write for each key.
The temporal KG kept every edge, closing each as the next one opened.

=== Timeline preserved by the temporal KG ===
  Acme.CTO:
          [Q1, Q2)  = Dana  (closed)
          [Q2, Q4)  = Ravi  (closed)
         [Q4, now)  = Mei   (current)
  Acme.plan:
          [Q1, Q3)  = Free  (closed)
         [Q3, now)  = Pro   (current)

=== Query A: who is the CTO NOW? (both stores should agree) ===
  key-value store : Mei
  temporal KG     : Mei
  -> agree: the current value is the easy case.

=== Query B: who was the CTO in Q1? (point-in-time) ===
  key-value store : Mei   <- WRONG
    it only stores the latest value; it overwrote Dana and Ravi and
    cannot answer about the past. It returns today's CTO for every date.
  temporal KG     : Dana   <- correct
    it scans for the edge whose [valid_from, valid_to) interval covers Q1.

=== Query C: walk the CTO forward, quarter by quarter ===
  as of Q1:  key-value=Mei    temporal-KG=Dana
  as of Q2:  key-value=Mei    temporal-KG=Ravi
  as of Q3:  key-value=Mei    temporal-KG=Ravi
  as of Q4:  key-value=Mei    temporal-KG=Mei
  the key-value column is frozen at the latest value; the temporal-KG
  column tracks the real succession Dana -> Ravi -> Ravi -> Mei.

=== Query D: bi-temporal, valid time vs ingestion time ===
  plan timeline (with ingestion stamps):
          [Q1, Q3)  = Free   learned_on 2024-01-01
          [Q3, Q4)  = Pro    learned_on 2024-07-01
         [Q4, now)  = Free   learned_on 2024-12-15

  valid-time answer 'what WAS the plan at Q4?':      Free
    (Free: the change took effect at Q4 in the world.)
  ingestion-time fact 'when did we LEARN the Q4 plan?': 2024-12-15
    (the edge was valid from Q4 but only entered the store in December.)
  An audit needs both: what was true, and when we knew it.

Read the contrast in Query C, because it is the entire point. The key-value column prints Mei for every quarter, including Q1 and Q2 when Mei had nothing to do with Acme. It is not lying on purpose; it simply has one slot per key and Mei was the last write, so that is all it can ever return. The temporal-KG column tracks the real succession, because each value kept its interval and the point-in-time query scans for the edge whose [valid_from, valid_to) window contains the date you asked about.

A point-in-time query is just that rewind: you hand the store a past instant and it returns the value that was current then, not now. Query A shows that for the "now" case both stores agree, which is the trap. The naive store looks correct as long as you only ever ask about the present, and it stays wrong and silent the moment anyone asks about the past.

Query D is the bi-temporal payoff. We record a Q4 plan change that the system did not learn about until December 15. The valid-time interval starts at Q4 (when the change took effect) while learned_on is December 15 (when we ingested it). Asking "what was the plan at Q4?" returns Free by valid time; asking "when did we know?" returns December 15 by ingestion time. One clock could not have told both stories.

How this connects to context and to the real tools

In context-engineering terms, a temporal KG is a kind of memory (Chapter 9) that you query for a time-scoped slice before assembling the turn. Instead of injecting "Acme's CTO is Mei" (true now, wrong for any historical question), you inject the value valid at the time the task is about, or the whole short timeline if the task reasons across the change. The win is point-in-time correctness: the agent stops confidently answering past questions with present facts.

This is a real and active design, not a toy. Graphiti is an open-source temporal knowledge-graph engine that ingests episodes (chunks of conversation or documents), extracts triples, and maintains bi-temporal validity on the edges, closing old edges and opening new ones exactly as the demo does. Zep builds agent memory on top of Graphiti and tracks fact-validity windows so an agent can reason about facts that change over a long-running relationship. On temporal-reasoning benchmarks (the kind that ask "what was true at time $T$" rather than "what is true"), this approach is reported to outperform vector-only memory, which is the expected result: a vector store retrieves the most similar fact and has no representation of when, so it answers "who is the CTO" and "who was the CTO in Q1" with the same nearest neighbor. The use cases are wherever facts have a lifetime: prices, plan tiers, roles, account statuses, feature flags, and any "what did we know, and when did we know it" audit.

Using the real tool: commands and before/after proof

The from-scratch store above is the whole idea in 100 lines. The production tool that does the same thing, plus the entity extraction we hand-waved past, is Graphiti. You give it plain text (an "episode") and it pulls out the triples for you, stamps each edge with a validity interval, and closes the old edge when a new fact supersedes it, which is exactly the close-and-open move from the demo. Below are the real commands.

Graphiti is not self-contained the way our demo is. It needs two things you have to provide: a graph database to store the nodes and edges (Neo4j 5.26 or newer is the default; it also supports FalkorDB and Amazon Neptune), and an LLM to do the extraction, because turning a sentence into triples is itself a model call (it defaults to OpenAI and reads OPENAI_API_KEY; Anthropic, Gemini, and Groq are also supported). State that honestly: this is heavier than the dict-of-edges we built. The payoff is that it extracts facts from raw prose instead of making you hand-write every triple.

# The library (Python 3.10+).
pip install graphiti-core

# Needs a graph DB backend. Easiest is Neo4j in Docker:
docker run -d --name neo4j -p 7687:7687 -p 7474:7474 \
  -e NEO4J_AUTH=neo4j/password neo4j:5.26

# Needs an LLM key for extraction (OpenAI is the default backend):
export OPENAI_API_KEY=sk-...

A minimal real session ingests two facts that disagree across time, then asks a point-in-time question. The methods are async, so they run inside asyncio. The names below are the real ones: add_episode to ingest a chunk of text, search to query.

import asyncio
from datetime import datetime, timezone

from graphiti_core import Graphiti
from graphiti_core.nodes import EpisodeType


async def main():
    # 1. Connect to the graph DB. Three args: URI, user, password.
    graphiti = Graphiti("bolt://localhost:7687", "neo4j", "password")
    await graphiti.build_indices_and_constraints()  # one-time setup

    # 2. Ingest two episodes whose CTO fact changes between Q1 and Q4.
    #    reference_time is the VALID time: when the fact was true in the world.
    await graphiti.add_episode(
        name="q1_update",
        episode_body="In Q1, Acme's CTO is Dana.",
        source=EpisodeType.text,
        source_description="status update",
        reference_time=datetime(2024, 1, 1, tzinfo=timezone.utc),
    )
    await graphiti.add_episode(
        name="q4_update",
        episode_body="As of Q4, Acme's CTO is Mei.",
        source=EpisodeType.text,
        source_description="status update",
        reference_time=datetime(2024, 10, 1, tzinfo=timezone.utc),
    )

    # 3. Query. Each result edge carries .fact and a .valid_at interval,
    #    so you can keep the edge whose validity covers the date you asked about.
    results = await graphiti.search("Who was Acme's CTO in Q1?")
    for edge in results:
        print(edge.fact, "| valid_at:", edge.valid_at)

    await graphiti.close()


asyncio.run(main())

This snippet is follow-along: the graphiti-core library, the Neo4j backend, and the LLM key are not installed on this box, so we are not pasting measured output for it. The pieces are the real ones. add_episode is how text goes in, reference_time is the valid-time stamp, and search returns edges that carry their own validity windows.

Before and after: proving point-in-time correctness

The metric to prove is point-in-time correctness: does the store return the value that was true at the date you asked about, or does it return whatever it saw last? Here is the recipe, and it is the same scenario the demo already ran on this box.

  1. Ingest a fact that changes. "Acme's CTO is Dana (Q1)," then later "Acme's CTO is Mei (Q4)." Two episodes, one valid in Q1 and one valid from Q4.
  2. Ask a past question. "Who was the CTO in Q1?"
  3. Compare two memories.

A flat vector memory (Chapter 9) stores both sentences as embeddings and returns the most similar one. "Who was the CTO in Q1?" is close to both, and since it has no clock it typically surfaces the most recent or highest-scoring fact, which is Mei. That is the wrong answer for Q1. The temporal graph instead keeps Dana's edge with its [Q1, Q4) validity and Mei's edge with its [Q4, now) validity, and returns the one whose interval covers Q1.

Illustrative, expected (not measured here, since the library is not on this box):

question: "Who was Acme's CTO in Q1?"

flat vector memory  -> "Mei"    (WRONG: returns the most similar/recent fact, no clock)
temporal graph      -> "Dana"   (correct: returns the value whose validity covers Q1)

That before/after is exactly what the on-box demo measured for real. Look back at Query B and Query C in the verified output above: the key-value store (a stand-in for any memory that keeps one current value per fact) printed Mei for every quarter, while the temporal KG walked the real succession Dana -> Ravi -> Ravi -> Mei. The illustrative Graphiti result here and the measured demo result there are the same finding: keep the interval, get the past right; drop it, and every historical question collapses onto the latest value. (Zep, the managed product built on Graphiti, is pip install zep-cloud; it wraps the same fact-validity windows behind a hosted memory API.)

In an agent

Temporal memory earns its keep the moment an assistant has to reason about facts that change: a product's price, a person's role, an account's status, a feature flag, an order's state. For those, the useful question is rarely "what is true now"; it is "what was true when this happened," and a similarity-only memory cannot answer it, because it stores what without when. Before assembling the turn, query the temporal store for the value valid at the relevant time (or the short timeline if the task spans the change) and inject that, so a question about Q1 gets Q1's answer instead of today's.

Takeaways

  • A temporal fact is a triple (subject, predicate, object) plus a validity interval $[t_0, t_1)$. The interval, not the value, is what lets you ask about the past.
  • A plain key-value store keeps only the current value, so it answers "now" correctly and any point-in-time question with the same wrong latest value, silently.
  • Supersession means close, not delete: stamp the old edge's valid_to and open a new edge. History is preserved and the timeline tiles without gaps or overlaps.
  • Bi-temporal means two clocks. Valid time is when a fact was true in the world; ingestion time is when your system learned it. Audits and "why did the agent do that" need both.
  • For an agent, query the KG for the value valid at the relevant time before assembling the context, so it stops answering past questions with present facts. Graphiti and Zep do this in production.

👉 We now have memory that knows what was true and when. But the live conversation still grows without bound inside the window. The next chapter is about that growth directly: context-window management and compaction, where we summarize and prune the running history so a long session keeps fitting.