Sequence-aware recommendation

Why this chapter matters: everything so far treated a user's history as an unordered bag of items ("you like soccer"). But order carries information: what you read just now predicts what you'll read next better than your average taste. Sequence-aware models exploit that, and they're behind "Up next" autoplay and session-based feeds. This chapter builds the simplest one from scratch and shows it wins at next-article prediction.

The shift: from "taste" to "what's next"

  • Static (earlier chapters): summarize all of a user's history into one taste vector, recommend similar items. Great for "more like what you generally like."
  • Sequential (here): model transitions — given the last item (or the recent sequence), predict the next. Great for "you just read X, so here's Y."

A news reader who just opened a Champions League final article is, right now, most likely to open another match report — not a random article from their all-time-favorite category. Order captures that intent.

The simplest sequence model: a first-order Markov chain

Count how often item B is read right after item A across all users. That gives a transition table; to recommend, look at the user's last item and return the items most likely to follow it.

build:   for each user's chronological history a -> b -> c ...
         T[a, b] += 1 ;  T[b, c] += 1 ; ...
recommend(user): take their last item L, rank items by T[L, *]

This is a first-order Markov model (only the last item matters). It's the transparent ancestor of neural session models like GRU4Rec (a recurrent net over the sequence) and SASRec (self-attention over the sequence), which learn richer, longer-range patterns — but the core idea is this table.

The code

"""Sequence-aware recommendation: predict the NEXT article from the ORDER of
what a user read, not just an unordered taste profile.

This is a from-scratch first-order Markov model over item transitions ("people
who read A next read B"), optionally blended with content similarity. It's the
transparent ancestor of session models like GRU4Rec / SASRec.
"""

from __future__ import annotations

import numpy as np


class SequenceRecommender:
    def __init__(self, blend=0.0):
        # blend in [0,1]: 0 = pure transitions, 1 = pure content similarity
        self.blend = blend

    def fit(self, data, content_rec=None):
        self.data = data
        self.ids = data.article_ids
        self.row = {nid: r for r, nid in enumerate(self.ids)}
        n = len(self.ids)
        self.T = np.zeros((n, n))                     # T[a, b] = count(a -> b)
        self.last = {}                                # user -> last clicked item

        for u, hist in data.user_history.items():
            seq = [nid for nid, _ in sorted(hist, key=lambda x: x[1])]
            for a, b in zip(seq, seq[1:]):
                ra, rb = self.row.get(a), self.row.get(b)
                if ra is not None and rb is not None:
                    self.T[ra, rb] += 1.0
            if seq:
                self.last[u] = seq[-1]

        # row-normalize transitions into probabilities
        rowsum = self.T.sum(1, keepdims=True)
        self.P = self.T / np.maximum(rowsum, 1e-9)
        self.content = content_rec                    # optional content blend
        return self

    def _scores_from_item(self, item_id):
        r = self.row.get(item_id)
        if r is None:
            return np.zeros(len(self.ids))
        scores = self.P[r].copy()
        if self.blend > 0 and self.content is not None:
            csim = self.content.emb @ self.content.emb[r]      # content similarity
            scores = (1 - self.blend) * scores + self.blend * csim
        return scores

    def predict_next(self, last_item, k=10, exclude=()):
        scores = self._scores_from_item(last_item)
        order = np.argsort(-scores)
        out = [self.ids[i] for i in order if self.ids[i] not in exclude and scores[i] > 0]
        return out[:k]

    def recommend(self, user, k=10):
        last = self.last.get(user)
        if last is None:
            return []
        seen = {nid for nid, _ in self.data.user_history.get(user, [])}
        return self.predict_next(last, k, exclude=seen)

Notes:

  • T[a, b] counts transitions; row-normalizing gives P[a, b] = probability of reading b next after a.
  • blend mixes the transition score with content similarity — pure transitions are sparse (many item pairs are never observed), so leaning on content fills the gaps. This hybrid is the practical sweet spot.

Does order actually help? (verified)

We evaluate next-article prediction: train on everything except each user's last click, then predict that held-out next click (the fair leave-last-out split from Chapter 3).

next-article prediction over 399 users:
  ContentBased (profile)   recall@10=0.083  ndcg@10=0.044
  Sequence (Markov)        recall@10=0.105  ndcg@10=0.054
  Sequence + content 0.3   recall@10=0.281  ndcg@10=0.116

The story is clear:

  • The pure Markov model already beats the static content profile (0.105 vs 0.083) — order carries signal.
  • Blending order with content is dramatically better (0.281, more than 3× the content-only baseline). Transitions say "what tends to come next"; content fills in unseen pairs with "what's similar." Together they're far stronger than either alone.

A concrete prediction — the user just read a Ballon d'Or piece, and the model suggests more soccer:

user U106: last read -> "Bukayo Saka wins Ballon d'Or after stellar football season"
   next -> soccer | Erling Haaland completes 60 million transfer to Bayern Munich
   next -> soccer | Juventus beat Arsenal 2-2 in La Liga
   next -> soccer | Manchester City and Paris Saint-Germain play out thrilling 4-4 draw

How to use it in the capstone

SequenceRecommender follows the same .fit(...) / .recommend(user, k) interface as the other models, so it slots into the API like any other — or, better, becomes another candidate generator in the two-stage architecture (Chapter 9): retrieve some candidates by taste (content/CF) and some by what's next (sequence), then let the ranker combine them. Mixing complementary retrievers is standard in production.

Going deeper

  • Higher-order / session models — condition on the last few items, not just one (GRU4Rec, SASRec, BERT4Rec). They capture longer patterns at the cost of a trained neural net.
  • Time-aware sequences — weight recent transitions more (the same time decay from Chapter 5).
  • Session boundaries — reset the sequence when a user starts a new visit; intent within a session is the strongest signal of all.

That rounds out the capstone: from baselines and classic CF, through matrix factorization and learning-to-rank, to a production stack with MLflow, an API, a UI, RAG, and now sequence models — the real toolkit of a modern recommender engineer. 🎓