From-Scratch Build · Recommendation Systems

Game Recommender

Amazon's video-game reviews hold a quiet map of what-goes-with-what — if you can read it. This build turns that noisy, sparse pile of ratings into clean "if you liked this, play that" recommendations with item-based collaborative filtering, and measures itself against a popularity baseline on held-out data.

PythonAdjusted cosineItem-based CF Sparse matricesAmazon reviewsRecall@K

What it is

Recommendations from real review data

The data is the real Amazon Video Games 5-core review corpus (~231k reviews). It's sparse and skewed — a handful of blockbusters drown out a long tail of niche games. A prep step streams the raw corpus, de-duplicates, caps the catalogue to the most-reviewed titles and applies a 5/10 k-core, leaving a clean committed slice: 71,746 ratings · 8,314 users · 1,198 games at 0.72% density.

The build learns item-to-item relationships from this: games frequently loved by the same people are linked, so a title you rated highly pulls up its closest companions. Item-based collaborative filtering fits the mess precisely because relationships between games are far more stable than the fickle, sparse profiles of individual users.

+92.8%
Recall@10 lift of item-based CF over a popularity baseline on 8,215 held-out users (0.1042 vs 0.0540) — real numbers from the eval below.

The stack

From raw reviews to "play this next"

Real data, real evaluation — scipy sparse matrices, no toy shortcuts.

data

Review corpus

Amazon Video Games 5-core reviews, k-cored to a clean 71,746-rating slice.

prep

Sparse matrix

A CSR user × game matrix built with scipy so the maths runs at scale.

core

Adjusted cosine

Mean-centre each user's ratings, then cosine over co-raters — cancels the "rates everything 5★" bias.

robust

Overlap + shrinkage

Drop pairs with too few co-raters; shrink thinly-supported similarities toward zero.

rank

Top-N retrieval

Aggregate neighbour similarities over a user's games, excluding what they've already played.

evaluate

Recall@K vs baseline

Leave-one-out hold-out, scored against a popularity baseline — +92.8% at K=10.

Architecture

How a recommendation is built

From raw reviews to a related-games list, the pipeline is steady:

  1. Ingest

    Stream the Amazon corpus, de-duplicate, cap the catalogue and k-core into a clean ratings slice.

  2. Sparsify

    Build the CSR user × game matrix and mean-centre each user's observed ratings.

  3. Relate

    Adjusted-cosine similarity over co-raters, with overlap filtering, shrinkage and top-k neighbours.

  4. Recommend

    Aggregate neighbour similarities over a user's games; rank unseen titles top-N.

  5. Validate

    Leave-one-out hold-out; report Recall@K / HitRate@K against a popularity baseline.

Results

Measured against a baseline

Leave-one-out evaluation on 8,215 held-out users — for each, one 4★+ game is hidden and the model is scored on whether it surfaces in the top-K. Item-based CF clears the popularity baseline at every K. Real numbers from python demo.py.

Recall@5

0.0668

vs 0.0348 popularity · +92.0%

Recall@10

0.1042

vs 0.0540 popularity · +92.8%

Recall@20

0.1529

vs 0.0857 popularity · +78.4%

At K=10 the recommender finds the held-out game for 856 of 8,215 users, against 444 for the baseline. The full package, tests and eval are on GitHub.

Reflection

What rebuilding it taught me