worked example · designing & querying a bookstore database

Worked example — designing & querying a bookstore database

One end-to-end relational project: requirements → ER model → 3NF → DDL → analytical SQL → indexing → NoSQL trade-offs. All SQL is real and runs on SQLite.

This page walks the full lifecycle of a small relational database for an online bookstore, the same path the course follows from Database Design through SQL Joins, aggregation and window functions. We start from a plain-language brief, draw the entities and relationships, normalise the design to third normal form, write the CREATE TABLE statements with proper keys and constraints, load a little sample data, then build up a progression of queries — from a single join to window functions and a view. We close with an indexing and query-plan discussion and a note on when a document or wide-column store would serve better than SQL.

Every query below states its purpose and the result it produces against the sample data, so you can read it as a narrative or paste it into the in-browser SQLite playground and run it yourself.

goal

Model + query a bookstore

stack

SQL · SQLite

design target

3NF

tables

6 + 1 view

sessions exercised

2–10

Sessions this exercises

The project deliberately touches the technical spine of the syllabus:

S2 — Data Models & DDL: relational model and CREATE TABLE.
S4 / S7 — Normalisation & many-to-many: 3NF and the book_author junction table.
S5 — Logical/Physical design & ER diagrams: the entity-relationship model in section 2.
S6 — WHERE and CASE: filtering and categorisation in the query progression.
S8 — SQL Joins & set theory: inner / left joins across the schema.
S9 — Aggregation, window functions, CTEs: GROUP BY, RANK(), WITH.
S3 / S7 — B-trees & indexes: the indexing & EXPLAIN QUERY PLAN discussion.
S18 — Intro to NoSQL: the closing trade-off note.

1 · requirements + ER 2 · normalisation 3 · DDL schema 4 · queries 5 · indexing 6 · outcomes 7 · extensions 8 · references

1. Requirements & ER design

From a plain-language brief to entities, attributes and relationships.

briefThe requirements

An online bookstore needs to track its catalogue and sales. From a stakeholder conversation we extract:

The store sells books; each book has a title, ISBN-13, publication date, price and a page count, and belongs to exactly one publisher.
A book is written by one or more authors, and an author may write many books (a true many-to-many relationship).
Each book sits in exactly one genre (fantasy, science, history, …).
Customers place orders; an order is made by one customer and contains one or more books, each with a quantity and the unit price charged at the time of sale.

entitiesEntities & relationships

Those nouns become six entities. The verbs become relationships, with cardinalities read as (min..max):

publisher 1 — N book ("publishes"): a publisher has many books; a book has one publisher.
genre 1 — N book ("classifies").
book M — N author ("written by"): resolved by the book_author junction entity.
customer 1 — N order ("places").
order M — N book ("contains"): resolved by the order_item junction entity, which carries quantity and unit_price as relationship attributes.

diagramThe ER diagram (described)

Read the diagram as boxes (entities) connected by lines (relationships). A crow's foot (<, shown as * below) marks the "many" end; 1 marks the "one" end. PK = primary key, FK = foreign key. The two junction tables sit between the entities they connect.

publisher genre (publisher_id PK) (genre_id PK) | 1 | 1 | | | * | * +----------------------- book -----------------------+ | book_id PK | publisher_id FK | genre_id FK | isbn13 | +----+--------------------------------------------+------+ | * | * | | book_author (book_id FK, author_id FK) order_item | PK = (book_id, author_id) | (order_id FK, book_id FK) | * | PK = (order_id, book_id) | | + quantity, unit_price author (author_id PK) | * | "orders" (order_id PK, customer_id FK) | * | 1 customer (customer_id PK)

The two M—N relationships ("written by" and "contains") cannot be stored directly in a relational table, so each becomes its own table whose primary key is the pair of foreign keys. This is the standard resolution covered in Session 4 / 7 — Many-to-many Relationships.

2. Normalisation to 3NF

Starting from one fat spreadsheet and removing repeating groups, partial and transitive dependencies.

0NFThe un-normalised starting point

Imagine the bookstore began life as a single spreadsheet — one row per line on an order, with everything repeated inline:

-- one wide, redundant "orders" sheet (unnormalised)
order_id, order_date, customer_name, customer_email,
          book_title, authors, publisher_name, publisher_city,
          genre, unit_price, quantity

This is convenient to read but painful to maintain: a publisher's city is repeated on every book it published, a customer's email is repeated on every order, and authors crams several names into one cell.

FDsThe functional dependencies

We write down what determines what. X → Y reads "X functionally determines Y":

customer_id → customer_name, customer_email
publisher_id → publisher_name, publisher_city
genre_id → genre_name
author_id → author_name
book_id → title, isbn13, published, price, pages, publisher_id, genre_id
order_id → order_date, customer_id
(order_id, book_id) → quantity, unit_price
(book_id, author_id) → /* association only — no extra attributes */

1NFFirst normal form

1NF requires atomic values and no repeating groups. The authors cell ("Tolkien; Lewis") violates this, so we split authorship into its own rows. After 1NF every cell holds a single value and each row is uniquely identifiable.

2NFSecond normal form

2NF removes partial dependencies — non-key attributes that depend on only part of a composite key. In the order-line data the key is (order_id, book_id), but order_date depends on order_id alone and title depends on book_id alone. We split these out into the orders and book tables; only quantity and unit_price — which genuinely depend on the full pair — stay in order_item.

3NFThird normal form

3NF removes transitive dependencies — a non-key attribute that depends on another non-key attribute. In book, publisher_city depends on publisher_id, which is itself a non-key attribute of the book: book_id → publisher_id → publisher_city. We move publisher attributes into their own publisher table and keep only the publisher_id foreign key on the book. The same reasoning peels off genre, author and customer.

result — 3NF Six base tables — publisher, genre, author, book, customer, "orders" — plus two junction tables book_author and order_item. Every non-key column now depends on the key, the whole key, and nothing but the key. No update anomaly remains: a publisher's city lives in exactly one place.

3. DDL schema

The 3NF design as runnable SQLite DDL, with primary keys, foreign keys, and CHECK / UNIQUE / NOT NULL constraints.

PRAGMA foreign_keys = ON;   -- SQLite needs FKs switched on per connection

CREATE TABLE publisher (
  publisher_id  INTEGER PRIMARY KEY,
  name          TEXT    NOT NULL UNIQUE,
  city          TEXT
);

CREATE TABLE genre (
  genre_id      INTEGER PRIMARY KEY,
  name          TEXT    NOT NULL UNIQUE
);

CREATE TABLE author (
  author_id     INTEGER PRIMARY KEY,
  name          TEXT    NOT NULL
);

CREATE TABLE book (
  book_id       INTEGER PRIMARY KEY,
  title         TEXT    NOT NULL,
  isbn13        TEXT    NOT NULL UNIQUE,
  published     DATE,
  price         NUMERIC NOT NULL CHECK (price >= 0),
  pages         INTEGER CHECK (pages > 0),
  publisher_id  INTEGER NOT NULL REFERENCES publisher(publisher_id),
  genre_id      INTEGER NOT NULL REFERENCES genre(genre_id)
);

CREATE TABLE book_author (        -- resolves book M—N author
  book_id       INTEGER NOT NULL REFERENCES book(book_id),
  author_id     INTEGER NOT NULL REFERENCES author(author_id),
  PRIMARY KEY (book_id, author_id)
);

CREATE TABLE customer (
  customer_id   INTEGER PRIMARY KEY,
  name          TEXT    NOT NULL,
  email         TEXT    NOT NULL UNIQUE,
  joined        DATE    NOT NULL DEFAULT (date('now'))
);

CREATE TABLE "orders" (              -- ORDER is reserved, so quote it
  order_id      INTEGER PRIMARY KEY,
  customer_id   INTEGER NOT NULL REFERENCES customer(customer_id),
  order_date    DATE    NOT NULL DEFAULT (date('now'))
);

CREATE TABLE order_item (         -- resolves order M—N book
  order_id      INTEGER NOT NULL REFERENCES "orders"(order_id),
  book_id       INTEGER NOT NULL REFERENCES book(book_id),
  quantity      INTEGER NOT NULL CHECK (quantity > 0),
  unit_price    NUMERIC NOT NULL CHECK (unit_price >= 0),
  PRIMARY KEY (order_id, book_id)
);

design notes order_item.unit_price is stored on the line, not read from book.price, because the price charged is a historical fact that must not change if the catalogue price later changes. The composite primary keys on the two junction tables both enforce uniqueness and prevent a book being listed twice on the same order.

seedSample data

A handful of rows to make the queries concrete:

INSERT INTO publisher VALUES
  (1,'Allen & Unwin','London'), (2,'Anchor','New York'), (3,'Tor','New York');

INSERT INTO genre VALUES (1,'fantasy'), (2,'science'), (3,'history');

INSERT INTO author VALUES
  (1,'J. R. R. Tolkien'), (2,'C. S. Lewis'),
  (3,'Carl Sagan'), (4,'N. K. Jemisin');

-- book(id, title, isbn13, published, price, pages, publisher_id, genre_id)
INSERT INTO book VALUES
  (1,'The Hobbit',           '9780261103283','1937-09-21', 9.99, 310, 1, 1),
  (2,'The Fellowship',       '9780261103573','1954-07-29',12.50, 423, 1, 1),
  (3,'The Fifth Season',     '9780316229296','2015-08-04',11.00, 468, 3, 1),
  (4,'Cosmos',               '9780345539434','1980-09-12',14.00, 365, 2, 2),
  (5,'The Demon-Haunted World','9780345409461','1995-02-25',13.00, 457, 2, 2);

INSERT INTO book_author VALUES
  (1,1), (2,1), (3,4), (4,3), (5,3);

INSERT INTO customer VALUES
  (1,'Ada Lovelace','ada@calc.io','2024-01-10'),
  (2,'Alan Turing','alan@halts.io','2024-03-02');

INSERT INTO "orders" VALUES
  (1, 1, '2024-05-01'), (2, 1, '2024-06-15'), (3, 2, '2024-06-20');

-- order_item(order_id, book_id, quantity, unit_price)
INSERT INTO order_item VALUES
  (1,1,1, 9.99), (1,4,2,14.00),
  (2,3,1,11.00),
  (3,2,1,12.50), (3,5,1,13.00);

4. Queries, step by step

A progression from a single join to window functions and a view. Each query lists its purpose and the result against the sample data above.

4.1A two-table INNER JOIN

The most basic question: list every book with its publisher's name. The join walks the publisher_id foreign key.

SELECT   b.title, p.name AS publisher
FROM     book b
JOIN     publisher p ON p.publisher_id = b.publisher_id
ORDER BY b.title;

purposeResolve the book → publisher FK into a readable label.

result5 rows, one per book — e.g. Cosmos → Anchor, The Hobbit → Allen & Unwin.

4.2A multi-table join across the M—N authorship

Now traverse the junction table to attach author names, and add the genre. Three joins chain together.

SELECT   b.title, a.name AS author, g.name AS genre
FROM     book b
JOIN     book_author ba ON ba.book_id  = b.book_id
JOIN     author      a  ON a.author_id = ba.author_id
JOIN     genre       g  ON g.genre_id  = b.genre_id
ORDER BY a.name, b.title;

purposeFlatten the many-to-many "written by" relationship into one row per (book, author).

result5 rows: Carl Sagan → Cosmos (science), Carl Sagan → The Demon-Haunted World (science), J. R. R. Tolkien → The Fellowship (fantasy), … → The Hobbit (fantasy), N. K. Jemisin → The Fifth Season (fantasy).

4.3LEFT JOIN to find the gaps

Which customers have never ordered? A LEFT JOIN keeps every customer; the WHERE … IS NULL filter keeps only those with no matching order — the "left exclusive" region from the JOIN-types demo.

SELECT   c.name, c.email
FROM     customer c
LEFT JOIN "orders" o ON o.customer_id = c.customer_id
WHERE    o.order_id IS NULL;

purposeDetect rows on the "one" side with no children — impossible with an inner join.

result0 rows here (both Ada and Alan have ordered). Delete order 3 and Alan appears.

4.4Aggregation with GROUP BY

Revenue per genre. We compute line revenue as quantity × unit_price, join through to the genre, and sum within each group. HAVING filters on the aggregate.

SELECT   g.name AS genre,
         COUNT(DISTINCT oi.order_id)        AS orders,
         SUM(oi.quantity)                    AS units,
         ROUND(SUM(oi.quantity * oi.unit_price), 2) AS revenue
FROM     order_item oi
JOIN     book  b ON b.book_id  = oi.book_id
JOIN     genre g ON g.genre_id = b.genre_id
GROUP BY g.genre_id
HAVING   revenue > 0
ORDER BY revenue DESC;

purposeCollapse line items into one summary row per genre.

result2 rows. science: 28.00 + 13.00 → revenue 41.00, 3 units; fantasy: 9.99 + 11.00 + 12.50 → 33.49, 3 units.

genre	orders	units	revenue
science	1	3	41.00
fantasy	3	3	33.49

4.5A correlated subquery

List books priced above the average price within their own genre. The inner query is correlated — it re-runs for each outer row, referencing b.genre_id.

SELECT   b.title, g.name AS genre, b.price
FROM     book b
JOIN     genre g ON g.genre_id = b.genre_id
WHERE    b.price > (
           SELECT AVG(b2.price)
           FROM   book b2
           WHERE  b2.genre_id = b.genre_id   -- correlation
         )
ORDER BY g.name, b.price DESC;

purposeCompare each row to a group-local aggregate without collapsing the rows.

resultFantasy avg ≈ 11.16 → The Fellowship (12.50). Science avg = 13.50 → Cosmos (14.00).

4.6Window functions

Rank books by price within each genre and keep every row — the thing a GROUP BY cannot do. RANK() and AVG() here are window functions partitioned by genre (Session 9).

SELECT   g.name AS genre, b.title, b.price,
         RANK()       OVER (PARTITION BY b.genre_id ORDER BY b.price DESC) AS price_rank,
         ROUND(AVG(b.price) OVER (PARTITION BY b.genre_id), 2)        AS genre_avg
FROM     book b
JOIN     genre g ON g.genre_id = b.genre_id
ORDER BY g.name, price_rank;

purposeAttach a per-group rank and per-group average to each detail row.

result5 rows. In fantasy: Fellowship rank 1 (12.50), Fifth Season 2 (11.00), Hobbit 3 (9.99), genre_avg 11.16. In science: Cosmos 1 (14.00), Demon-Haunted World 2 (13.00), genre_avg 13.50.

genre	title	price	price_rank	genre_avg
fantasy	The Fellowship	12.50	1	11.16
fantasy	The Fifth Season	11.00	2	11.16
fantasy	The Hobbit	9.99	3	11.16
science	Cosmos	14.00	1	13.50
science	The Demon-Haunted World	13.00	2	13.50

4.7A CTE + running total

A common table expression (the WITH clause) names an intermediate result, which we then scan with a windowed running total of revenue over time.

WITH daily AS (
  SELECT   o.order_date,
           SUM(oi.quantity * oi.unit_price) AS day_revenue
  FROM     "orders" o
  JOIN     order_item oi ON oi.order_id = o.order_id
  GROUP BY o.order_date
)
SELECT order_date, day_revenue,
       SUM(day_revenue) OVER (ORDER BY order_date) AS running_total
FROM   daily
ORDER BY order_date;

purposeStage an aggregate (the CTE) then apply a window over it — a clean two-step pipeline.

result2024-05-01: 37.99 → 37.99; 2024-06-15: 11.00 → 48.99; 2024-06-20: 25.50 → 74.49.

4.8A view

Wrap the most-used join in a named VIEW so analysts query a friendly virtual table instead of rebuilding joins each time.

CREATE VIEW v_order_detail AS
SELECT o.order_id, o.order_date, c.name AS customer,
       b.title, oi.quantity, oi.unit_price,
       oi.quantity * oi.unit_price AS line_total
FROM   "orders" o
JOIN   customer   c  ON c.customer_id = o.customer_id
JOIN   order_item oi ON oi.order_id   = o.order_id
JOIN   book       b  ON b.book_id     = oi.book_id;

-- now a one-liner reuses all four joins:
SELECT customer, SUM(line_total) AS spent
FROM   v_order_detail
GROUP BY customer
ORDER BY spent DESC;

purposeEncapsulate a recurring 4-table join; views also act as a stable interface and a coarse access-control boundary.

resultAda Lovelace 48.99, Alan Turing 25.50.

Paste any of these into the SQLite playground — it runs the same engine in WebAssembly.

5. Indexing & performance

Why a B-tree index turns an O(n) scan into an O(log n) probe, and how to read a query plan.

By default SQLite stores rows in a B-tree keyed on the primary key, and reads a non-indexed lookup with a full table scan — it inspects every row. A query that filters or joins on a non-key column therefore costs O(n). Adding a B-tree index on that column lets the engine binary-search to the matching rows in O(log n), exactly the contrast shown in the index-vs-scan demo and grounded in the B-tree material from Session 3.

planReading the query plan

Ask SQLite how it intends to run query 4.4 before and after indexing:

EXPLAIN QUERY PLAN
SELECT g.name, SUM(oi.quantity * oi.unit_price)
FROM   order_item oi
JOIN   book  b ON b.book_id  = oi.book_id
JOIN   genre g ON g.genre_id = b.genre_id
GROUP BY g.genre_id;

-- before indexing, the join on book.genre_id shows:
--   SCAN b            (full table scan of book)
--   SEARCH oi USING INTEGER PRIMARY KEY  (fast, it's the PK)

indexAdding the right index

-- speed up grouping/filtering by genre and the FK join
CREATE INDEX idx_book_genre ON book(genre_id);

-- composite/covering index for the common "by author" lookup
CREATE INDEX idx_ba_author ON book_author(author_id, book_id);

-- after CREATE INDEX, the same EXPLAIN now reports:
--   SEARCH b USING INDEX idx_book_genre (genre_id=?)

trade-offs Indexes are not free. Each one is a second B-tree that must be kept in sync on every INSERT / UPDATE / DELETE, costing write throughput and disk. Index the columns you filter, join, and sort on — especially foreign keys — and measure with EXPLAIN QUERY PLAN rather than guessing. A covering index that includes every column a query needs lets the engine answer from the index alone, skipping the table entirely.

6. Mapping to learning outcomes

How each step of this project lands a syllabus learning objective.

Step in this project	Session	Learning outcome
ER model & requirements (§1)	S5	Understand data structures; produce a database design document.
Normalisation to 3NF (§2)	S4 / S7	Understand data structures behind database systems; resolve many-to-many relationships.
DDL with keys & constraints (§3)	S2	Implement databases using SQL.
Joins & set logic (§4.1–4.3)	S8	Apply SQL joins and set theory to combine relations.
Aggregation, subqueries, windows, view (§4.4–4.8)	S9	Aggregate results; use window functions and common table expressions.
Indexing & query plans (§5)	S3 / S7	Assess implementation issues; reason about B-trees, partitioning and indexes.
NoSQL trade-off note (§7)	S18	Choose a database system appropriate for different types of projects.

7. Extensions — when to reach for NoSQL

The relational model is the right default here. Some workloads, though, point elsewhere.

This bookstore is a textbook fit for SQL: the data is highly relational, the schema is stable, and the interesting questions are joins and aggregates with strong consistency (you must not over-sell stock or double-charge a card). Where would a different model win?

Document store (MongoDB — S18–S21): if a "book" grew a deeply nested, irregular structure (reviews, varying metadata per format, marketing blurbs) that is almost always read as a whole, storing each book as a single JSON document removes the join cost and tolerates a flexible schema. The price is duplicated data and application-enforced integrity.
Wide-column store (Cassandra — S27–S28): a write-heavy event log such as "every page-view and click in the store" suits a partitioned, leaderless design optimised for high write throughput and horizontal scale, at the cost of joins and ad-hoc querying.
Sharding & partitioning (S7 / S19): if order volume outgrew one machine, we would shard the orders and order_item tables by customer_id so a customer's data stays co-located, while keeping small reference tables (genre, publisher) replicated to every node. Cross-shard joins become the new bottleneck — the trade-off NoSQL designs accept up front.
Analytics / star schema (S13): for heavy reporting we would copy this OLTP schema into a denormalised star schema (a sales fact table surrounded by date / book / customer dimensions) in a data warehouse, trading normalisation for fast scans.

rule of thumb Start relational and normalised; reach for a document, wide-column, or star-schema design only when a concrete access pattern (flexible nesting, web-scale writes, or large-scale analytics) makes the join-and-consistency guarantees of SQL the wrong trade-off.

8. References

Course materials and the demos this example builds on.

Course outline — Sessions 2–10 (data models, DDL, normalisation, joins, aggregation, window functions) and Session 18 (intro to NoSQL).
sql-lab SQLite playground — run every query on this page in your browser.
JOIN-types demo and index-vs-scan demo — the visual intuition behind §4 and §5.
Course syllabus (PDF) — full session and learning-objective list.
Alex Petrov, Database Internals (O'Reilly) — B-trees, storage engines and indexing (compulsory text).
Guy Harrison, Next Generation Databases: NoSQL, NewSQL, and Big Data (Apress) — the NoSQL trade-offs in §7 (compulsory text).