From-Scratch Build · Recommendation Systems
A book recommendation engine wrapped in a small social reading platform: shelve what you've read, rate it, find readers with taste like yours, and get pointed at your next book from what they loved. I built both halves from scratch — neighborhood collaborative filtering, and the SQLite store that feeds it.
What it is
The core idea is collaborative filtering: people who agreed with you in the past are a good guide to what you'll like next. BookDB measures taste similarity between readers with cosine similarity on their mean-centered rating vectors, picks your nearest neighbours — your kindred readers — and recommends the books they loved that you haven't read. No genre tags required; the patterns emerge from ratings alone.
There's an item-based view too: adjusted-cosine similarity between books, so titles that get loved together (the "co-loved" blocks) surface together. Around the engine sits the platform — a SQLite store of users, books, ratings and shelves — because a recommender is only as good as the behaviour it learns from.
The stack
A recommender engine and the platform that feeds it.
A sparse user×book matrix (200×120, 14.6% dense) — who read and rated what, the raw fuel of every recommendation.
Cosine similarity on mean-centered rows finds your kindred readers; their ratings, weighted by similarity, score your unread books.
Adjusted-cosine between book columns ranks co-loved titles together — books the same readers rate highly.
A count-damped mean rating, non-personalized — the bar collaborative filtering has to clear.
Users, books, ratings and shelves behind a clean Python API: recommend_for, kindred_readers, add_rating.
Leave-one-out HitRate@K and Recall@K on held-out reads — does the top of the list actually get picked up?
Architecture
From a reader's history to a ranked shelf, the path is consistent:
Pull the reader's ratings and shelves from the SQLite platform into a sparse user×book matrix.
Cosine similarity (mean-centered) ranks the reader's kindred readers — or the adjusted-cosine neighbours of each book.
Similarity-weighted average of how neighbours rated each book gives a predicted score for every unread title.
Order by score, drop already-read books, drop thinly-supported ones; cold-start readers fall back to popularity.
Leave-one-out Recall@K confirms the ranking recovers held-out reads far more often than the baseline.
Reflection