BookDB — Built From Scratch

What it is

Your next read, from kindred readers

The core idea is collaborative filtering: people who agreed with you in the past are a good guide to what you'll like next. BookDB measures taste similarity between readers with cosine similarity on their mean-centered rating vectors, picks your nearest neighbours — your kindred readers — and recommends the books they loved that you haven't read. No genre tags required; the patterns emerge from ratings alone.

There's an item-based view too: adjusted-cosine similarity between books, so titles that get loved together (the "co-loved" blocks) surface together. Around the engine sits the platform — a SQLite store of users, books, ratings and shelves — because a recommender is only as good as the behaviour it learns from.

5.9×

item-based CF recovers a held-out liked book 5.9× as often as the popularity baseline at K=10 (Recall@10 0.560 vs 0.095, leave-one-out over 200 readers).

The stack

From ratings to recommendations

A recommender engine and the platform that feeds it.

data

Reader–book ratings

A sparse user×book matrix (200×120, 14.6% dense) — who read and rated what, the raw fuel of every recommendation.

core · users

User-based CF

Cosine similarity on mean-centered rows finds your kindred readers; their ratings, weighted by similarity, score your unread books.

core · items

Item-based CF

Adjusted-cosine between book columns ranks co-loved titles together — books the same readers rate highly.

baseline

Popularity

A count-damped mean rating, non-personalized — the bar collaborative filtering has to clear.

platform

SQLite store + API

Users, books, ratings and shelves behind a clean Python API: recommend_for, kindred_readers, add_rating.

evaluate

Recall@K

Leave-one-out HitRate@K and Recall@K on held-out reads — does the top of the list actually get picked up?

Architecture

How a recommendation is made

From a reader's history to a ranked shelf, the path is consistent:

Collect
Pull the reader's ratings and shelves from the SQLite platform into a sparse user×book matrix.
Find neighbours
Cosine similarity (mean-centered) ranks the reader's kindred readers — or the adjusted-cosine neighbours of each book.
Score
Similarity-weighted average of how neighbours rated each book gives a predicted score for every unread title.
Rank & filter
Order by score, drop already-read books, drop thinly-supported ones; cold-start readers fall back to popularity.
Evaluate
Leave-one-out Recall@K confirms the ranking recovers held-out reads far more often than the baseline.

Reflection

What rebuilding it taught me

Mean-centering is what makes cosine "taste". Without it, similarity tracks how generously someone rates, not what they like — centering fixes the harsh-vs-generous bias.
Adjusted cosine matters for items. Raw cosine made oppositely-loved books look similar; centering book columns lets anti-correlated titles take a negative similarity, and item-based Recall@10 jumped.
One noisy neighbour can wreck a list. Requiring a minimum number of neighbours to support a book stabilised user-based CF across neighbourhood sizes.
The platform is the dataset. Recommendations are downstream of behaviour — building the SQLite store that creates the signal is half the engine.
Rank, don't predict. Readers never see a rating prediction — they see an ordered list, so Recall@K, not error, is the metric that counts.

Your next read, from kindred readers

From ratings to recommendations

Reader–book ratings

User-based CF

Item-based CF

Popularity

SQLite store + API

Recall@K

How a recommendation is made

Collect

Find neighbours

Score

Rank & filter

Evaluate

What rebuilding it taught me