Game Recommender — Built From Scratch

What it is

Recommendations from real review data

The data is the real Amazon Video Games 5-core review corpus (~231k reviews). It's sparse and skewed — a handful of blockbusters drown out a long tail of niche games. A prep step streams the raw corpus, de-duplicates, caps the catalogue to the most-reviewed titles and applies a 5/10 k-core, leaving a clean committed slice: 71,746 ratings · 8,314 users · 1,198 games at 0.72% density.

The build learns item-to-item relationships from this: games frequently loved by the same people are linked, so a title you rated highly pulls up its closest companions. Item-based collaborative filtering fits the mess precisely because relationships between games are far more stable than the fickle, sparse profiles of individual users.

+92.8%

Recall@10 lift of item-based CF over a popularity baseline on 8,215 held-out users (0.1042 vs 0.0540) — real numbers from the eval below.

The stack

From raw reviews to "play this next"

Real data, real evaluation — scipy sparse matrices, no toy shortcuts.

data

Review corpus

Amazon Video Games 5-core reviews, k-cored to a clean 71,746-rating slice.

prep

Sparse matrix

A CSR user × game matrix built with scipy so the maths runs at scale.

core

Adjusted cosine

Mean-centre each user's ratings, then cosine over co-raters — cancels the "rates everything 5★" bias.

robust

Overlap + shrinkage

Drop pairs with too few co-raters; shrink thinly-supported similarities toward zero.

rank

Top-N retrieval

Aggregate neighbour similarities over a user's games, excluding what they've already played.

evaluate

Recall@K vs baseline

Leave-one-out hold-out, scored against a popularity baseline — +92.8% at K=10.

Architecture

How a recommendation is built

From raw reviews to a related-games list, the pipeline is steady:

Ingest
Stream the Amazon corpus, de-duplicate, cap the catalogue and k-core into a clean ratings slice.
Sparsify
Build the CSR user × game matrix and mean-centre each user's observed ratings.
Relate
Adjusted-cosine similarity over co-raters, with overlap filtering, shrinkage and top-k neighbours.
Recommend
Aggregate neighbour similarities over a user's games; rank unseen titles top-N.
Validate
Leave-one-out hold-out; report Recall@K / HitRate@K against a popularity baseline.

Results

Measured against a baseline

Leave-one-out evaluation on 8,215 held-out users — for each, one 4★+ game is hidden and the model is scored on whether it surfaces in the top-K. Item-based CF clears the popularity baseline at every K. Real numbers from python demo.py.

Recall@5

0.0668

vs 0.0348 popularity · +92.0%

Recall@10

0.1042

vs 0.0540 popularity · +92.8%

Recall@20

0.1529

vs 0.0857 popularity · +78.4%

At K=10 the recommender finds the held-out game for 856 of 8,215 users, against 444 for the baseline. The full package, tests and eval are on GitHub.

Reflection

What rebuilding it taught me

The data picks the method. Sparse, skewed reviews make item-based CF the natural fit — stable item relationships beat fragile user profiles.
Mean-centring matters. Adjusted cosine cancels the "rates everything 5★" bias; plain cosine over raw stars is fooled by generous raters.
Sparsity is noise. A similarity from two shared raters is meaningless — overlap filtering and shrinkage were the difference between signal and junk.
Popularity is a real baseline. Just recommending blockbusters already lands ~5% Recall@10; a model has to clearly beat that to earn its complexity. This one roughly doubles it.
Offline metrics are a proxy. Recall@K is honest about hold-out hits, but only hints at whether a human would actually click play.