From-Scratch Build · Recommendation Systems
Amazon's video-game reviews hold a quiet map of what-goes-with-what — if you can read it. This build turns that noisy, sparse pile of ratings into clean "if you liked this, play that" recommendations with item-based collaborative filtering, and measures itself against a popularity baseline on held-out data.
What it is
The data is the real Amazon Video Games 5-core review corpus (~231k reviews). It's sparse and skewed — a handful of blockbusters drown out a long tail of niche games. A prep step streams the raw corpus, de-duplicates, caps the catalogue to the most-reviewed titles and applies a 5/10 k-core, leaving a clean committed slice: 71,746 ratings · 8,314 users · 1,198 games at 0.72% density.
The build learns item-to-item relationships from this: games frequently loved by the same people are linked, so a title you rated highly pulls up its closest companions. Item-based collaborative filtering fits the mess precisely because relationships between games are far more stable than the fickle, sparse profiles of individual users.
The stack
Real data, real evaluation — scipy sparse matrices, no toy shortcuts.
Amazon Video Games 5-core reviews, k-cored to a clean 71,746-rating slice.
A CSR user × game matrix built with scipy so the maths runs at scale.
Mean-centre each user's ratings, then cosine over co-raters — cancels the "rates everything 5★" bias.
Drop pairs with too few co-raters; shrink thinly-supported similarities toward zero.
Aggregate neighbour similarities over a user's games, excluding what they've already played.
Leave-one-out hold-out, scored against a popularity baseline — +92.8% at K=10.
Architecture
From raw reviews to a related-games list, the pipeline is steady:
Stream the Amazon corpus, de-duplicate, cap the catalogue and k-core into a clean ratings slice.
Build the CSR user × game matrix and mean-centre each user's observed ratings.
Adjusted-cosine similarity over co-raters, with overlap filtering, shrinkage and top-k neighbours.
Aggregate neighbour similarities over a user's games; rank unseen titles top-N.
Leave-one-out hold-out; report Recall@K / HitRate@K against a popularity baseline.
Results
Leave-one-out evaluation on 8,215 held-out users — for each, one 4★+ game is hidden and the model is scored on whether it surfaces in the top-K. Item-based CF clears the popularity baseline at every K. Real numbers from python demo.py.
vs 0.0348 popularity · +92.0%
vs 0.0540 popularity · +92.8%
vs 0.0857 popularity · +78.4%
At K=10 the recommender finds the held-out game for 856 of 8,215 users, against 444 for the baseline. The full package, tests and eval are on GitHub.
Reflection