From-Scratch Build · Recommendation Systems
Genre is a blunt instrument — two thrillers can feel nothing alike. Reel is a film discovery engine built on learned embeddings: every film becomes a point in a space of taste, and your next watch is the one that sits closest to what you already love. I built it from scratch to learn how meaning, not metadata, drives modern discovery.
What it is
Reel learns a vector for every film by factorising the user-by-film ratings matrix with stochastic gradient descent — written from scratch in NumPy, no ML library doing the fitting. Films that people rate the same way land near each other, so discovery becomes geometry: name a film you love and Reel returns its nearest neighbours in taste-space. The matches cut across genres, because closeness captures the feel a label can't.
Trained on MovieLens latest-small — 610 users, 9,724 films, 100,836 ratings. The honest catch I hit: a rating-accurate model is a poor ranker, so top-N recommendations use a second set of embeddings trained on a ranking objective (BPR). Every number on this page is from a real run.
The stack
Two from-scratch SGD models, each doing the job it's good at.
610 users × 9,724 films × 100,836 ratings, fetched on demand. A labelled synthetic fallback backs the tests.
r̂ = μ + b_u + b_i + p_u·q_i, fit by L2-regularised squared error. Hand-written SGD updates over NumPy.
"Films like X" = nearest item embeddings q_i by cosine. The matches cross genres.
A second embedding set trained to put liked films above unliked ones — what powers top-N.
Score every unseen film with the ranking model, return the highest.
Held-out RMSE for the MF; Recall@10 for top-N, measured against a popularity baseline.
Architecture
The pipeline, end to end:
Pull MovieLens ratings; remap sparse user/film ids to dense indices.
SGD over observed ratings learns biases + a latent vector for every user and film.
"Films like X" = cosine nearest neighbours among the learned film vectors.
A second BPR-trained embedding set scores unseen films per user for top-N.
Held-out RMSE for the MF; Recall@10 for top-N vs a popularity baseline.
Results
From python demo.py on MovieLens latest-small — 80/20 split, fixed seeds, so they reproduce.
| Metric | Value |
|---|---|
| Rating model — held-out RMSE | 0.8559 |
| BPR ranking model — Recall@10 | 0.1464 |
| Popularity baseline — Recall@10 | 0.0988 |
| BPR lift over popularity | +48.2% |
| Rating model (RMSE-tuned) — Recall@10 | 0.0184 |
The last row is the honest catch: the RMSE-accurate model ranks worse than popularity, which is exactly why top-N uses BPR instead.
Cross-genre
"Films like X" from the learned embeddings, picking matches that wear a different genre label — the kind a genre filter could never surface. Straight from the run:
Reflection