From-Scratch Build · Recommendation Systems
The endless feed is judged by one thing: how long you lingered before the swipe. This build turns those split-second watch signals into the next pick — an implicit-feedback recommender with a confidence-weighted matrix factorization I wrote from scratch in NumPy/SciPy. No stars, no reviews: behaviour is the only label.
What it is
Unlike a store of products you browse, a short-video feed shows you exactly one thing and watches what you do. There are no stars and no reviews — just behaviour: did you watch to the end, or flick it away in half a second? That single number, watch_ratio = watch_time / duration, is the entire training signal.
The model turns it into implicit feedback in the style of Hu, Koren & Volinsky: a video you watched at all is a positive with preference 1, but how long you watched sets a confidence 1 + α·watch_ratio — a completed watch counts far more than a half-second glance. A confidence-weighted matrix factorization, trained by alternating least squares, learns user and video factors from that, and next_videos(user, k) ranks unseen videos by predicted preference.
demo.py.The pipeline
Behaviour in, an ordered list of unseen videos out.
watch_time ÷ duration, in [0, 1]. The only label a short-video feed gives you.
preference = 1 on any watch; confidence = 1 + α·watch_ratio weights it by how long.
Hand-rolled alternating least squares learns user + video factors from the weighted signal.
Score every unseen video by x·y, return the top-k. The actual recommendation.
A fresh completed watch re-solves that user's factor row and nudges the next pick.
Held-out ranking quality against a popularity baseline — real numbers, not claims.
How it works
Each watch feeds the same loop:
Record the watch_ratio for the video just shown.
Turn it into preference 1 with confidence 1 + α·watch_ratio — long watches count more.
Confidence-weighted ALS learns a latent vector for the user and for every video.
Score all unseen videos by the dot product of those vectors; take the top-k.
A fresh completed watch re-solves the user's factor row, so the next pick reflects it.
Real results
On synthetic data with planted interests (300 users, 400 videos, 6 topics, 18,000 interactions; a held-out video counts as relevant if it was watched ≥ 60%), the learned model is run against a non-personalised popularity baseline. These are the numbers demo.py prints:
NDCG@10 · 0.093 — the confidence-weighted matrix factorization.
NDCG@10 · 0.026 — same top-k for everyone, no personalisation.
Over 230 held-out users. The watch signal carries real, recoverable preference.
The data is synthetic and I say so plainly — but it is generated so watch behaviour actually reflects interest, which is what makes the structure recoverable in the first place.
Reflection