Short-Video Recommender — Built From Scratch

What it is

One feed, judged in seconds

Unlike a store of products you browse, a short-video feed shows you exactly one thing and watches what you do. There are no stars and no reviews — just behaviour: did you watch to the end, or flick it away in half a second? That single number, watch_ratio = watch_time / duration, is the entire training signal.

The model turns it into implicit feedback in the style of Hu, Koren & Volinsky: a video you watched at all is a positive with preference 1, but how long you watched sets a confidence 1 + α·watch_ratio — a completed watch counts far more than a half-second glance. A confidence-weighted matrix factorization, trained by alternating least squares, learns user and video factors from that, and next_videos(user, k) ranks unseen videos by predicted preference.

2.72×

held-out Recall@10 of the learned model over a popularity baseline, on planted-structure synthetic data — a real number from demo.py.

The pipeline

From a watch_ratio to the next video

Behaviour in, an ordered list of unseen videos out.

signal

watch_ratio

watch_time ÷ duration, in [0, 1]. The only label a short-video feed gives you.

model

Implicit feedback

preference = 1 on any watch; confidence = 1 + α·watch_ratio weights it by how long.

train

Confidence-weighted ALS

Hand-rolled alternating least squares learns user + video factors from the weighted signal.

rank

next_videos(user, k)

Score every unseen video by x·y, return the top-k. The actual recommendation.

online

Single-watch update

A fresh completed watch re-solves that user's factor row and nudges the next pick.

evaluate

Recall@K / NDCG@K

Held-out ranking quality against a popularity baseline — real numbers, not claims.

How it works

How the next video is chosen

Each watch feeds the same loop:

Observe
Record the watch_ratio for the video just shown.
Weight
Turn it into preference 1 with confidence 1 + α·watch_ratio — long watches count more.
Factor
Confidence-weighted ALS learns a latent vector for the user and for every video.
Rank
Score all unseen videos by the dot product of those vectors; take the top-k.
Update
A fresh completed watch re-solves the user's factor row, so the next pick reflects it.

Real results

It beats popularity — measured

On synthetic data with planted interests (300 users, 400 videos, 6 topics, 18,000 interactions; a held-out video counts as relevant if it was watched ≥ 60%), the learned model is run against a non-personalised popularity baseline. These are the numbers demo.py prints:

implicit-MF

Recall@10 · 0.141

NDCG@10 · 0.093 — the confidence-weighted matrix factorization.

popularity baseline

Recall@10 · 0.052

NDCG@10 · 0.026 — same top-k for everyone, no personalisation.

lift

2.72× Recall@10

Over 230 held-out users. The watch signal carries real, recoverable preference.

The data is synthetic and I say so plainly — but it is generated so watch behaviour actually reflects interest, which is what makes the structure recoverable in the first place.

Reflection

What building it taught me

Behaviour is the only label. With no ratings, watch_ratio is the ground truth — reading it right is the whole job.
Preference vs confidence. Splitting "did you watch?" from "how much?" is the key idea that makes implicit feedback work; it's what beats popularity here.
Hand-rolling ALS clarifies it. Writing the alternating least-squares update myself — precomputing YᵀY, solving per user — made the maths concrete instead of a library call.
Evaluation has to be honest. Recall@K against a popularity baseline is the test that tells you the model learned something personal, not just "show everyone the hits".
Optimising watch-time has ethics. The metric that works is also the one that can trap attention — worth building with eyes open.