From-Scratch Build · Recommendation Systems

BookDB

A book recommendation engine wrapped in a small social reading platform: shelve what you've read, rate it, find readers with taste like yours, and get pointed at your next book from what they loved. I built both halves from scratch — neighborhood collaborative filtering, and the SQLite store that feeds it.

PythonCollaborative filteringCosine similarity scikit-learnSQLiteRecall@K eval

What it is

Your next read, from kindred readers

The core idea is collaborative filtering: people who agreed with you in the past are a good guide to what you'll like next. BookDB measures taste similarity between readers with cosine similarity on their mean-centered rating vectors, picks your nearest neighbours — your kindred readers — and recommends the books they loved that you haven't read. No genre tags required; the patterns emerge from ratings alone.

There's an item-based view too: adjusted-cosine similarity between books, so titles that get loved together (the "co-loved" blocks) surface together. Around the engine sits the platform — a SQLite store of users, books, ratings and shelves — because a recommender is only as good as the behaviour it learns from.

5.9×
item-based CF recovers a held-out liked book 5.9× as often as the popularity baseline at K=10 (Recall@10 0.560 vs 0.095, leave-one-out over 200 readers).

The stack

From ratings to recommendations

A recommender engine and the platform that feeds it.

data

Reader–book ratings

A sparse user×book matrix (200×120, 14.6% dense) — who read and rated what, the raw fuel of every recommendation.

core · users

User-based CF

Cosine similarity on mean-centered rows finds your kindred readers; their ratings, weighted by similarity, score your unread books.

core · items

Item-based CF

Adjusted-cosine between book columns ranks co-loved titles together — books the same readers rate highly.

baseline

Popularity

A count-damped mean rating, non-personalized — the bar collaborative filtering has to clear.

platform

SQLite store + API

Users, books, ratings and shelves behind a clean Python API: recommend_for, kindred_readers, add_rating.

evaluate

Recall@K

Leave-one-out HitRate@K and Recall@K on held-out reads — does the top of the list actually get picked up?

Architecture

How a recommendation is made

From a reader's history to a ranked shelf, the path is consistent:

  1. Collect

    Pull the reader's ratings and shelves from the SQLite platform into a sparse user×book matrix.

  2. Find neighbours

    Cosine similarity (mean-centered) ranks the reader's kindred readers — or the adjusted-cosine neighbours of each book.

  3. Score

    Similarity-weighted average of how neighbours rated each book gives a predicted score for every unread title.

  4. Rank & filter

    Order by score, drop already-read books, drop thinly-supported ones; cold-start readers fall back to popularity.

  5. Evaluate

    Leave-one-out Recall@K confirms the ranking recovers held-out reads far more often than the baseline.

Reflection

What rebuilding it taught me