Baselines: popularity & trending

Always build the dumb version first. Popularity and trending are non-personalized recommenders — they show everyone (almost) the same thing — yet they're surprisingly strong, they're the standard cold-start fallback, and they're the bar every fancy model must clear to justify its complexity.

Count how many times each item was interacted with; recommend the most-counted items the user hasn't already seen.

import numpy as np

class Popularity:
    def fit(self, train, n_users, n_items):
        self.counts = np.zeros(n_items)
        for u, i, t in train:
            self.counts[i] += 1               # how many interactions per item
        self.seen = user_seen(train, n_users) # to skip items the user already has
        return self

    def recommend(self, u, k=10):
        return _topk_excluding(self.counts, self.seen.get(u, set()), k)

It's trivial, but it encodes real signal: popular items are popular because many people liked them, so a random user probably will too. It's also the thing to show a brand-new user (more in Cold start).

Plain popularity treats a click from a year ago the same as one from this morning. Trending fixes that by weighting recent interactions more, using exponential time decay.

Exponential decay, explained

We want each interaction's weight to shrink as it ages. The exponential decay weight for an interaction that happened at time $t$, evaluated at "now", is

$$ w = e^{-\lambda (\text{now} - t)}, \qquad \lambda = \frac{\ln 2}{H}. $$

  • $(\text{now} - t)$ is the interaction's age.
  • $\lambda$ (lambda) is the decay rate.
  • $H$ is the half-life: the age at which weight drops to exactly 0.5. Setting $\lambda = \ln 2 / H$ guarantees that. Two half-lives → 0.25, three → 0.125, and so on.

Half-life is the intuitive knob: "an interaction from one half-life ago counts half as much." A short half-life = very recency-biased (fast-moving news); a long half-life ≈ plain popularity.

class Trending:
    def __init__(self, half_life=2000.0):
        self.half_life = half_life

    def fit(self, train, n_users, n_items):
        lam = np.log(2) / self.half_life
        now = max(t for u, i, t in train)
        self.scores = np.zeros(n_items)
        for u, i, t in train:
            self.scores[i] += np.exp(-lam * (now - t))   # recent counts more
        self.seen = user_seen(train, n_users)
        return self

    def recommend(self, u, k=10):
        return _topk_excluding(self.scores, self.seen.get(u, set()), k)

The same idea runs the internet

The "hot" rankings on Hacker News and Reddit are exactly this: a score that combines votes with an age penalty so fresh items bubble up and old ones sink. Hacker News uses roughly

$$ \text{score} = \frac{(\text{votes} - 1)^{0.8}}{(\text{age}_{\text{hours}} + 2)^{1.8}}, $$

a power-law decay rather than exponential, but the principle is identical: popularity discounted by age. Time decay is one of the most reused tricks in all of recommendations.

How good are baselines?

From the leaderboard (recall@10, higher is better):

  Random             0.033
  Popularity         0.073
  Trending           0.073

Popularity doubles random — a real signal from a one-line idea. (Trending matches it here because our synthetic data has no strong time-of-day trend; on genuinely fast-moving catalogs trending pulls ahead.) Personalized models will beat these, but not by as much as you'd expect — which is exactly why popularity is the baseline you must always measure against.

When baselines are the right answer

  • Cold start — a new user with zero history: you have nothing personal to go on, so trending is your best guess.
  • Cold catalogs — brand-new items with no interactions yet.
  • A sanity floor — if your deep model can't beat popularity, something is wrong.

Next we make recommendations personal using item content. 👉