Worked example — designing & querying a bookstore database
One end-to-end relational project: requirements → ER model → 3NF → DDL → analytical SQL → indexing → NoSQL trade-offs. All SQL is real and runs on SQLite.
This page walks the full lifecycle of a small relational database for an online bookstore, the same path the
course follows from Database Design through SQL Joins, aggregation and window
functions. We start from a plain-language brief, draw the entities and relationships, normalise the design
to third normal form, write the CREATE TABLE statements with proper keys and constraints, load a
little sample data, then build up a progression of queries — from a single join to window functions and a view.
We close with an indexing and query-plan discussion and a note on when a document or wide-column store would
serve better than SQL.
Every query below states its purpose and the result it produces against the sample data, so you can read it as a narrative or paste it into the in-browser SQLite playground and run it yourself.
Sessions this exercises
The project deliberately touches the technical spine of the syllabus:
- S2 — Data Models & DDL: relational model and
CREATE TABLE. - S4 / S7 — Normalisation & many-to-many: 3NF and the
book_authorjunction table. - S5 — Logical/Physical design & ER diagrams: the entity-relationship model in section 2.
- S6 —
WHEREandCASE: filtering and categorisation in the query progression. - S8 — SQL Joins & set theory: inner / left joins across the schema.
- S9 — Aggregation, window functions, CTEs:
GROUP BY,RANK(),WITH. - S3 / S7 — B-trees & indexes: the indexing &
EXPLAIN QUERY PLANdiscussion. - S18 — Intro to NoSQL: the closing trade-off note.
1. Requirements & ER design
From a plain-language brief to entities, attributes and relationships.
briefThe requirements
An online bookstore needs to track its catalogue and sales. From a stakeholder conversation we extract:
- The store sells books; each book has a title, ISBN-13, publication date, price and a page count, and belongs to exactly one publisher.
- A book is written by one or more authors, and an author may write many books (a true many-to-many relationship).
- Each book sits in exactly one genre (fantasy, science, history, …).
- Customers place orders; an order is made by one customer and contains one or more books, each with a quantity and the unit price charged at the time of sale.
entitiesEntities & relationships
Those nouns become six entities. The verbs become relationships, with cardinalities read as (min..max):
- publisher 1 — N book ("publishes"): a publisher has many books; a book has one publisher.
- genre 1 — N book ("classifies").
- book M — N author ("written by"): resolved by the
book_authorjunction entity. - customer 1 — N order ("places").
- order M — N book ("contains"): resolved by the
order_itemjunction entity, which carriesquantityandunit_priceas relationship attributes.
diagramThe ER diagram (described)
Read the diagram as boxes (entities) connected by lines (relationships). A crow's foot (<, shown
as * below) marks the "many" end; 1 marks the "one" end. PK = primary key,
FK = foreign key. The two junction tables sit between the entities they connect.
The two M—N relationships ("written by" and "contains") cannot be stored directly in a relational table, so each becomes its own table whose primary key is the pair of foreign keys. This is the standard resolution covered in Session 4 / 7 — Many-to-many Relationships.
2. Normalisation to 3NF
Starting from one fat spreadsheet and removing repeating groups, partial and transitive dependencies.
0NFThe un-normalised starting point
Imagine the bookstore began life as a single spreadsheet — one row per line on an order, with everything repeated inline:
-- one wide, redundant "orders" sheet (unnormalised)
order_id, order_date, customer_name, customer_email,
book_title, authors, publisher_name, publisher_city,
genre, unit_price, quantity
This is convenient to read but painful to maintain: a publisher's city is repeated on every book it published,
a customer's email is repeated on every order, and authors crams several names into one cell.
FDsThe functional dependencies
We write down what determines what. X → Y reads "X functionally determines Y":
publisher_id → publisher_name, publisher_city
genre_id → genre_name
author_id → author_name
book_id → title, isbn13, published, price, pages, publisher_id, genre_id
order_id → order_date, customer_id
(order_id, book_id) → quantity, unit_price
(book_id, author_id) → /* association only — no extra attributes */
1NFFirst normal form
1NF requires atomic values and no repeating groups. The authors cell ("Tolkien; Lewis") violates
this, so we split authorship into its own rows. After 1NF every cell holds a single value and each row is
uniquely identifiable.
2NFSecond normal form
2NF removes partial dependencies — non-key attributes that depend on only part of a composite key. In
the order-line data the key is (order_id, book_id), but order_date depends on
order_id alone and title depends on book_id alone. We split these out
into the orders and book tables; only quantity and unit_price
— which genuinely depend on the full pair — stay in order_item.
3NFThird normal form
3NF removes transitive dependencies — a non-key attribute that depends on another non-key attribute.
In book, publisher_city depends on publisher_id, which is itself a
non-key attribute of the book: book_id → publisher_id → publisher_city. We move publisher
attributes into their own publisher table and keep only the publisher_id foreign key
on the book. The same reasoning peels off genre, author and customer.
3. DDL schema
The 3NF design as runnable SQLite DDL, with primary keys, foreign keys, and CHECK / UNIQUE / NOT NULL constraints.
PRAGMA foreign_keys = ON; -- SQLite needs FKs switched on per connection CREATE TABLE publisher ( publisher_id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE, city TEXT ); CREATE TABLE genre ( genre_id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE ); CREATE TABLE author ( author_id INTEGER PRIMARY KEY, name TEXT NOT NULL ); CREATE TABLE book ( book_id INTEGER PRIMARY KEY, title TEXT NOT NULL, isbn13 TEXT NOT NULL UNIQUE, published DATE, price NUMERIC NOT NULL CHECK (price >= 0), pages INTEGER CHECK (pages > 0), publisher_id INTEGER NOT NULL REFERENCES publisher(publisher_id), genre_id INTEGER NOT NULL REFERENCES genre(genre_id) ); CREATE TABLE book_author ( -- resolves book M—N author book_id INTEGER NOT NULL REFERENCES book(book_id), author_id INTEGER NOT NULL REFERENCES author(author_id), PRIMARY KEY (book_id, author_id) ); CREATE TABLE customer ( customer_id INTEGER PRIMARY KEY, name TEXT NOT NULL, email TEXT NOT NULL UNIQUE, joined DATE NOT NULL DEFAULT (date('now')) ); CREATE TABLE "orders" ( -- ORDER is reserved, so quote it order_id INTEGER PRIMARY KEY, customer_id INTEGER NOT NULL REFERENCES customer(customer_id), order_date DATE NOT NULL DEFAULT (date('now')) ); CREATE TABLE order_item ( -- resolves order M—N book order_id INTEGER NOT NULL REFERENCES "orders"(order_id), book_id INTEGER NOT NULL REFERENCES book(book_id), quantity INTEGER NOT NULL CHECK (quantity > 0), unit_price NUMERIC NOT NULL CHECK (unit_price >= 0), PRIMARY KEY (order_id, book_id) );
order_item.unit_price is stored on the line, not read from book.price, because the
price charged is a historical fact that must not change if the catalogue price later changes. The composite
primary keys on the two junction tables both enforce uniqueness and prevent a book being listed twice on the
same order.
seedSample data
A handful of rows to make the queries concrete:
INSERT INTO publisher VALUES (1,'Allen & Unwin','London'), (2,'Anchor','New York'), (3,'Tor','New York'); INSERT INTO genre VALUES (1,'fantasy'), (2,'science'), (3,'history'); INSERT INTO author VALUES (1,'J. R. R. Tolkien'), (2,'C. S. Lewis'), (3,'Carl Sagan'), (4,'N. K. Jemisin'); -- book(id, title, isbn13, published, price, pages, publisher_id, genre_id) INSERT INTO book VALUES (1,'The Hobbit', '9780261103283','1937-09-21', 9.99, 310, 1, 1), (2,'The Fellowship', '9780261103573','1954-07-29',12.50, 423, 1, 1), (3,'The Fifth Season', '9780316229296','2015-08-04',11.00, 468, 3, 1), (4,'Cosmos', '9780345539434','1980-09-12',14.00, 365, 2, 2), (5,'The Demon-Haunted World','9780345409461','1995-02-25',13.00, 457, 2, 2); INSERT INTO book_author VALUES (1,1), (2,1), (3,4), (4,3), (5,3); INSERT INTO customer VALUES (1,'Ada Lovelace','ada@calc.io','2024-01-10'), (2,'Alan Turing','alan@halts.io','2024-03-02'); INSERT INTO "orders" VALUES (1, 1, '2024-05-01'), (2, 1, '2024-06-15'), (3, 2, '2024-06-20'); -- order_item(order_id, book_id, quantity, unit_price) INSERT INTO order_item VALUES (1,1,1, 9.99), (1,4,2,14.00), (2,3,1,11.00), (3,2,1,12.50), (3,5,1,13.00);
4. Queries, step by step
A progression from a single join to window functions and a view. Each query lists its purpose and the result against the sample data above.
4.1A two-table INNER JOIN
The most basic question: list every book with its publisher's name. The join walks the publisher_id foreign key.
SELECT b.title, p.name AS publisher FROM book b JOIN publisher p ON p.publisher_id = b.publisher_id ORDER BY b.title;
4.2A multi-table join across the M—N authorship
Now traverse the junction table to attach author names, and add the genre. Three joins chain together.
SELECT b.title, a.name AS author, g.name AS genre FROM book b JOIN book_author ba ON ba.book_id = b.book_id JOIN author a ON a.author_id = ba.author_id JOIN genre g ON g.genre_id = b.genre_id ORDER BY a.name, b.title;
4.3LEFT JOIN to find the gaps
Which customers have never ordered? A LEFT JOIN keeps every customer; the WHERE … IS NULL filter keeps only those with no matching order — the "left exclusive" region from the JOIN-types demo.
SELECT c.name, c.email FROM customer c LEFT JOIN "orders" o ON o.customer_id = c.customer_id WHERE o.order_id IS NULL;
4.4Aggregation with GROUP BY
Revenue per genre. We compute line revenue as quantity × unit_price, join through to the genre, and sum within each group. HAVING filters on the aggregate.
SELECT g.name AS genre, COUNT(DISTINCT oi.order_id) AS orders, SUM(oi.quantity) AS units, ROUND(SUM(oi.quantity * oi.unit_price), 2) AS revenue FROM order_item oi JOIN book b ON b.book_id = oi.book_id JOIN genre g ON g.genre_id = b.genre_id GROUP BY g.genre_id HAVING revenue > 0 ORDER BY revenue DESC;
| genre | orders | units | revenue |
|---|---|---|---|
| science | 1 | 3 | 41.00 |
| fantasy | 3 | 3 | 33.49 |
4.5A correlated subquery
List books priced above the average price within their own genre. The inner query is correlated — it re-runs for each outer row, referencing b.genre_id.
SELECT b.title, g.name AS genre, b.price FROM book b JOIN genre g ON g.genre_id = b.genre_id WHERE b.price > ( SELECT AVG(b2.price) FROM book b2 WHERE b2.genre_id = b.genre_id -- correlation ) ORDER BY g.name, b.price DESC;
4.6Window functions
Rank books by price within each genre and keep every row — the thing a GROUP BY cannot do. RANK() and AVG() here are window functions partitioned by genre (Session 9).
SELECT g.name AS genre, b.title, b.price, RANK() OVER (PARTITION BY b.genre_id ORDER BY b.price DESC) AS price_rank, ROUND(AVG(b.price) OVER (PARTITION BY b.genre_id), 2) AS genre_avg FROM book b JOIN genre g ON g.genre_id = b.genre_id ORDER BY g.name, price_rank;
| genre | title | price | price_rank | genre_avg |
|---|---|---|---|---|
| fantasy | The Fellowship | 12.50 | 1 | 11.16 |
| fantasy | The Fifth Season | 11.00 | 2 | 11.16 |
| fantasy | The Hobbit | 9.99 | 3 | 11.16 |
| science | Cosmos | 14.00 | 1 | 13.50 |
| science | The Demon-Haunted World | 13.00 | 2 | 13.50 |
4.7A CTE + running total
A common table expression (the WITH clause) names an intermediate result, which we then scan with a windowed running total of revenue over time.
WITH daily AS ( SELECT o.order_date, SUM(oi.quantity * oi.unit_price) AS day_revenue FROM "orders" o JOIN order_item oi ON oi.order_id = o.order_id GROUP BY o.order_date ) SELECT order_date, day_revenue, SUM(day_revenue) OVER (ORDER BY order_date) AS running_total FROM daily ORDER BY order_date;
4.8A view
Wrap the most-used join in a named VIEW so analysts query a friendly virtual table instead of rebuilding joins each time.
CREATE VIEW v_order_detail AS SELECT o.order_id, o.order_date, c.name AS customer, b.title, oi.quantity, oi.unit_price, oi.quantity * oi.unit_price AS line_total FROM "orders" o JOIN customer c ON c.customer_id = o.customer_id JOIN order_item oi ON oi.order_id = o.order_id JOIN book b ON b.book_id = oi.book_id; -- now a one-liner reuses all four joins: SELECT customer, SUM(line_total) AS spent FROM v_order_detail GROUP BY customer ORDER BY spent DESC;
Paste any of these into the SQLite playground — it runs the same engine in WebAssembly.
5. Indexing & performance
Why a B-tree index turns an O(n) scan into an O(log n) probe, and how to read a query plan.
By default SQLite stores rows in a B-tree keyed on the primary key, and reads a non-indexed lookup with a full table scan — it inspects every row. A query that filters or joins on a non-key column therefore costs O(n). Adding a B-tree index on that column lets the engine binary-search to the matching rows in O(log n), exactly the contrast shown in the index-vs-scan demo and grounded in the B-tree material from Session 3.
planReading the query plan
Ask SQLite how it intends to run query 4.4 before and after indexing:
EXPLAIN QUERY PLAN SELECT g.name, SUM(oi.quantity * oi.unit_price) FROM order_item oi JOIN book b ON b.book_id = oi.book_id JOIN genre g ON g.genre_id = b.genre_id GROUP BY g.genre_id; -- before indexing, the join on book.genre_id shows: -- SCAN b (full table scan of book) -- SEARCH oi USING INTEGER PRIMARY KEY (fast, it's the PK)
indexAdding the right index
-- speed up grouping/filtering by genre and the FK join CREATE INDEX idx_book_genre ON book(genre_id); -- composite/covering index for the common "by author" lookup CREATE INDEX idx_ba_author ON book_author(author_id, book_id); -- after CREATE INDEX, the same EXPLAIN now reports: -- SEARCH b USING INDEX idx_book_genre (genre_id=?)
INSERT / UPDATE / DELETE, costing write throughput and disk. Index the
columns you filter, join, and sort on — especially foreign keys — and measure with
EXPLAIN QUERY PLAN rather than guessing. A covering index that includes every
column a query needs lets the engine answer from the index alone, skipping the table entirely.
6. Mapping to learning outcomes
How each step of this project lands a syllabus learning objective.
| Step in this project | Session | Learning outcome |
|---|---|---|
| ER model & requirements (§1) | S5 | Understand data structures; produce a database design document. |
| Normalisation to 3NF (§2) | S4 / S7 | Understand data structures behind database systems; resolve many-to-many relationships. |
| DDL with keys & constraints (§3) | S2 | Implement databases using SQL. |
| Joins & set logic (§4.1–4.3) | S8 | Apply SQL joins and set theory to combine relations. |
| Aggregation, subqueries, windows, view (§4.4–4.8) | S9 | Aggregate results; use window functions and common table expressions. |
| Indexing & query plans (§5) | S3 / S7 | Assess implementation issues; reason about B-trees, partitioning and indexes. |
| NoSQL trade-off note (§7) | S18 | Choose a database system appropriate for different types of projects. |
7. Extensions — when to reach for NoSQL
The relational model is the right default here. Some workloads, though, point elsewhere.
This bookstore is a textbook fit for SQL: the data is highly relational, the schema is stable, and the interesting questions are joins and aggregates with strong consistency (you must not over-sell stock or double-charge a card). Where would a different model win?
- Document store (MongoDB — S18–S21): if a "book" grew a deeply nested, irregular structure (reviews, varying metadata per format, marketing blurbs) that is almost always read as a whole, storing each book as a single JSON document removes the join cost and tolerates a flexible schema. The price is duplicated data and application-enforced integrity.
- Wide-column store (Cassandra — S27–S28): a write-heavy event log such as "every page-view and click in the store" suits a partitioned, leaderless design optimised for high write throughput and horizontal scale, at the cost of joins and ad-hoc querying.
- Sharding & partitioning (S7 / S19): if order volume outgrew one machine, we would
shard the
ordersandorder_itemtables bycustomer_idso a customer's data stays co-located, while keeping small reference tables (genre, publisher) replicated to every node. Cross-shard joins become the new bottleneck — the trade-off NoSQL designs accept up front. - Analytics / star schema (S13): for heavy reporting we would copy this OLTP schema into a
denormalised star schema (a
salesfact table surrounded by date / book / customer dimensions) in a data warehouse, trading normalisation for fast scans.
8. References
Course materials and the demos this example builds on.
- Course outline — Sessions 2–10 (data models, DDL, normalisation, joins, aggregation, window functions) and Session 18 (intro to NoSQL).
- sql-lab SQLite playground — run every query on this page in your browser.
- JOIN-types demo and index-vs-scan demo — the visual intuition behind §4 and §5.
- Course syllabus (PDF) — full session and learning-objective list.
- Alex Petrov, Database Internals (O'Reilly) — B-trees, storage engines and indexing (compulsory text).
- Guy Harrison, Next Generation Databases: NoSQL, NewSQL, and Big Data (Apress) — the NoSQL trade-offs in §7 (compulsory text).