From-Scratch Build · Recommendation Systems

Reel

Genre is a blunt instrument — two thrillers can feel nothing alike. Reel is a film discovery engine built on learned embeddings: every film becomes a point in a space of taste, and your next watch is the one that sits closest to what you already love. I built it from scratch to learn how meaning, not metadata, drives modern discovery.

Python · NumPyMatrix factorisationSGD from scratch BPR rankingCosine NNMovieLens

What it is

Discovery by closeness, not category

Reel learns a vector for every film by factorising the user-by-film ratings matrix with stochastic gradient descent — written from scratch in NumPy, no ML library doing the fitting. Films that people rate the same way land near each other, so discovery becomes geometry: name a film you love and Reel returns its nearest neighbours in taste-space. The matches cut across genres, because closeness captures the feel a label can't.

Trained on MovieLens latest-small — 610 users, 9,724 films, 100,836 ratings. The honest catch I hit: a rating-accurate model is a poor ranker, so top-N recommendations use a second set of embeddings trained on a ranking objective (BPR). Every number on this page is from a real run.

cos θ
cosine similarity between learned film embeddings — the single measure of "these feel alike" that powers every "films like X" lookup.

The stack

From a film you love to ones you'll love

Two from-scratch SGD models, each doing the job it's good at.

data

MovieLens ratings

610 users × 9,724 films × 100,836 ratings, fetched on demand. A labelled synthetic fallback backs the tests.

factorise

Rating MF (SGD)

r̂ = μ + b_u + b_i + p_u·q_i, fit by L2-regularised squared error. Hand-written SGD updates over NumPy.

similar

Cosine neighbours

"Films like X" = nearest item embeddings q_i by cosine. The matches cross genres.

rank

BPR ranking model

A second embedding set trained to put liked films above unliked ones — what powers top-N.

recommend

Top-N for a user

Score every unseen film with the ranking model, return the highest.

evaluate

RMSE + Recall@K

Held-out RMSE for the MF; Recall@10 for top-N, measured against a popularity baseline.

Architecture

How a film is discovered

The pipeline, end to end:

  1. Load

    Pull MovieLens ratings; remap sparse user/film ids to dense indices.

  2. Factorise

    SGD over observed ratings learns biases + a latent vector for every user and film.

  3. Similar

    "Films like X" = cosine nearest neighbours among the learned film vectors.

  4. Rank

    A second BPR-trained embedding set scores unseen films per user for top-N.

  5. Evaluate

    Held-out RMSE for the MF; Recall@10 for top-N vs a popularity baseline.

Results

Real numbers from a real run

From python demo.py on MovieLens latest-small — 80/20 split, fixed seeds, so they reproduce.

MetricValue
Rating model — held-out RMSE0.8559
BPR ranking model — Recall@100.1464
Popularity baseline — Recall@100.0988
BPR lift over popularity+48.2%
Rating model (RMSE-tuned) — Recall@100.0184

The last row is the honest catch: the RMSE-accurate model ranks worse than popularity, which is exactly why top-N uses BPR instead.

Cross-genre

Neighbours that share no genre

"Films like X" from the learned embeddings, picking matches that wear a different genre label — the kind a genre filter could never surface. Straight from the run:

Reflection

What building it taught me