Designing & Using Databases worked example project

Worked example — designing & querying a bookstore database

One end-to-end relational project: requirements → ER model → 3NF → DDL → analytical SQL → indexing → NoSQL trade-offs. All SQL is real and runs on SQLite.

This page walks the full lifecycle of a small relational database for an online bookstore, the same path the course follows from Database Design through SQL Joins, aggregation and window functions. We start from a plain-language brief, draw the entities and relationships, normalise the design to third normal form, write the CREATE TABLE statements with proper keys and constraints, load a little sample data, then build up a progression of queries — from a single join to window functions and a view. We close with an indexing and query-plan discussion and a note on when a document or wide-column store would serve better than SQL.

Every query below states its purpose and the result it produces against the sample data, so you can read it as a narrative or paste it into the in-browser SQLite playground and run it yourself.

goal
Model + query a bookstore
stack
SQL · SQLite
design target
3NF
tables
6 + 1 view
sessions exercised
2–10

Sessions this exercises

The project deliberately touches the technical spine of the syllabus:

1. Requirements & ER design

From a plain-language brief to entities, attributes and relationships.

briefThe requirements

An online bookstore needs to track its catalogue and sales. From a stakeholder conversation we extract:

entitiesEntities & relationships

Those nouns become six entities. The verbs become relationships, with cardinalities read as (min..max):

diagramThe ER diagram (described)

Read the diagram as boxes (entities) connected by lines (relationships). A crow's foot (<, shown as * below) marks the "many" end; 1 marks the "one" end. PK = primary key, FK = foreign key. The two junction tables sit between the entities they connect.

publisher genre (publisher_id PK) (genre_id PK) | 1 | 1 | | | * | * +----------------------- book -----------------------+ | book_id PK | publisher_id FK | genre_id FK | isbn13 | +----+--------------------------------------------+------+ | * | * | | book_author (book_id FK, author_id FK) order_item | PK = (book_id, author_id) | (order_id FK, book_id FK) | * | PK = (order_id, book_id) | | + quantity, unit_price author (author_id PK) | * | "orders" (order_id PK, customer_id FK) | * | 1 customer (customer_id PK)

The two M—N relationships ("written by" and "contains") cannot be stored directly in a relational table, so each becomes its own table whose primary key is the pair of foreign keys. This is the standard resolution covered in Session 4 / 7 — Many-to-many Relationships.

2. Normalisation to 3NF

Starting from one fat spreadsheet and removing repeating groups, partial and transitive dependencies.

0NFThe un-normalised starting point

Imagine the bookstore began life as a single spreadsheet — one row per line on an order, with everything repeated inline:

-- one wide, redundant "orders" sheet (unnormalised)
order_id, order_date, customer_name, customer_email,
          book_title, authors, publisher_name, publisher_city,
          genre, unit_price, quantity

This is convenient to read but painful to maintain: a publisher's city is repeated on every book it published, a customer's email is repeated on every order, and authors crams several names into one cell.

FDsThe functional dependencies

We write down what determines what. X → Y reads "X functionally determines Y":

customer_id customer_name, customer_email
publisher_id publisher_name, publisher_city
genre_id genre_name
author_id author_name
book_id title, isbn13, published, price, pages, publisher_id, genre_id
order_id order_date, customer_id
(order_id, book_id) quantity, unit_price
(book_id, author_id)  /* association only — no extra attributes */

1NFFirst normal form

1NF requires atomic values and no repeating groups. The authors cell ("Tolkien; Lewis") violates this, so we split authorship into its own rows. After 1NF every cell holds a single value and each row is uniquely identifiable.

2NFSecond normal form

2NF removes partial dependencies — non-key attributes that depend on only part of a composite key. In the order-line data the key is (order_id, book_id), but order_date depends on order_id alone and title depends on book_id alone. We split these out into the orders and book tables; only quantity and unit_price — which genuinely depend on the full pair — stay in order_item.

3NFThird normal form

3NF removes transitive dependencies — a non-key attribute that depends on another non-key attribute. In book, publisher_city depends on publisher_id, which is itself a non-key attribute of the book: book_id → publisher_id → publisher_city. We move publisher attributes into their own publisher table and keep only the publisher_id foreign key on the book. The same reasoning peels off genre, author and customer.

result — 3NF Six base tables — publisher, genre, author, book, customer, "orders" — plus two junction tables book_author and order_item. Every non-key column now depends on the key, the whole key, and nothing but the key. No update anomaly remains: a publisher's city lives in exactly one place.

3. DDL schema

The 3NF design as runnable SQLite DDL, with primary keys, foreign keys, and CHECK / UNIQUE / NOT NULL constraints.

PRAGMA foreign_keys = ON;   -- SQLite needs FKs switched on per connection

CREATE TABLE publisher (
  publisher_id  INTEGER PRIMARY KEY,
  name          TEXT    NOT NULL UNIQUE,
  city          TEXT
);

CREATE TABLE genre (
  genre_id      INTEGER PRIMARY KEY,
  name          TEXT    NOT NULL UNIQUE
);

CREATE TABLE author (
  author_id     INTEGER PRIMARY KEY,
  name          TEXT    NOT NULL
);

CREATE TABLE book (
  book_id       INTEGER PRIMARY KEY,
  title         TEXT    NOT NULL,
  isbn13        TEXT    NOT NULL UNIQUE,
  published     DATE,
  price         NUMERIC NOT NULL CHECK (price >= 0),
  pages         INTEGER CHECK (pages > 0),
  publisher_id  INTEGER NOT NULL REFERENCES publisher(publisher_id),
  genre_id      INTEGER NOT NULL REFERENCES genre(genre_id)
);

CREATE TABLE book_author (        -- resolves book M—N author
  book_id       INTEGER NOT NULL REFERENCES book(book_id),
  author_id     INTEGER NOT NULL REFERENCES author(author_id),
  PRIMARY KEY (book_id, author_id)
);

CREATE TABLE customer (
  customer_id   INTEGER PRIMARY KEY,
  name          TEXT    NOT NULL,
  email         TEXT    NOT NULL UNIQUE,
  joined        DATE    NOT NULL DEFAULT (date('now'))
);

CREATE TABLE "orders" (              -- ORDER is reserved, so quote it
  order_id      INTEGER PRIMARY KEY,
  customer_id   INTEGER NOT NULL REFERENCES customer(customer_id),
  order_date    DATE    NOT NULL DEFAULT (date('now'))
);

CREATE TABLE order_item (         -- resolves order M—N book
  order_id      INTEGER NOT NULL REFERENCES "orders"(order_id),
  book_id       INTEGER NOT NULL REFERENCES book(book_id),
  quantity      INTEGER NOT NULL CHECK (quantity > 0),
  unit_price    NUMERIC NOT NULL CHECK (unit_price >= 0),
  PRIMARY KEY (order_id, book_id)
);
design notes order_item.unit_price is stored on the line, not read from book.price, because the price charged is a historical fact that must not change if the catalogue price later changes. The composite primary keys on the two junction tables both enforce uniqueness and prevent a book being listed twice on the same order.

seedSample data

A handful of rows to make the queries concrete:

INSERT INTO publisher VALUES
  (1,'Allen & Unwin','London'), (2,'Anchor','New York'), (3,'Tor','New York');

INSERT INTO genre VALUES (1,'fantasy'), (2,'science'), (3,'history');

INSERT INTO author VALUES
  (1,'J. R. R. Tolkien'), (2,'C. S. Lewis'),
  (3,'Carl Sagan'), (4,'N. K. Jemisin');

-- book(id, title, isbn13, published, price, pages, publisher_id, genre_id)
INSERT INTO book VALUES
  (1,'The Hobbit',           '9780261103283','1937-09-21', 9.99, 310, 1, 1),
  (2,'The Fellowship',       '9780261103573','1954-07-29',12.50, 423, 1, 1),
  (3,'The Fifth Season',     '9780316229296','2015-08-04',11.00, 468, 3, 1),
  (4,'Cosmos',               '9780345539434','1980-09-12',14.00, 365, 2, 2),
  (5,'The Demon-Haunted World','9780345409461','1995-02-25',13.00, 457, 2, 2);

INSERT INTO book_author VALUES
  (1,1), (2,1), (3,4), (4,3), (5,3);

INSERT INTO customer VALUES
  (1,'Ada Lovelace','ada@calc.io','2024-01-10'),
  (2,'Alan Turing','alan@halts.io','2024-03-02');

INSERT INTO "orders" VALUES
  (1, 1, '2024-05-01'), (2, 1, '2024-06-15'), (3, 2, '2024-06-20');

-- order_item(order_id, book_id, quantity, unit_price)
INSERT INTO order_item VALUES
  (1,1,1, 9.99), (1,4,2,14.00),
  (2,3,1,11.00),
  (3,2,1,12.50), (3,5,1,13.00);

4. Queries, step by step

A progression from a single join to window functions and a view. Each query lists its purpose and the result against the sample data above.

4.1A two-table INNER JOIN

The most basic question: list every book with its publisher's name. The join walks the publisher_id foreign key.

SELECT   b.title, p.name AS publisher
FROM     book b
JOIN     publisher p ON p.publisher_id = b.publisher_id
ORDER BY b.title;
purposeResolve the book → publisher FK into a readable label.
result5 rows, one per book — e.g. Cosmos → Anchor, The Hobbit → Allen & Unwin.

4.2A multi-table join across the M—N authorship

Now traverse the junction table to attach author names, and add the genre. Three joins chain together.

SELECT   b.title, a.name AS author, g.name AS genre
FROM     book b
JOIN     book_author ba ON ba.book_id  = b.book_id
JOIN     author      a  ON a.author_id = ba.author_id
JOIN     genre       g  ON g.genre_id  = b.genre_id
ORDER BY a.name, b.title;
purposeFlatten the many-to-many "written by" relationship into one row per (book, author).
result5 rows: Carl Sagan → Cosmos (science), Carl Sagan → The Demon-Haunted World (science), J. R. R. Tolkien → The Fellowship (fantasy), … → The Hobbit (fantasy), N. K. Jemisin → The Fifth Season (fantasy).

4.3LEFT JOIN to find the gaps

Which customers have never ordered? A LEFT JOIN keeps every customer; the WHERE … IS NULL filter keeps only those with no matching order — the "left exclusive" region from the JOIN-types demo.

SELECT   c.name, c.email
FROM     customer c
LEFT JOIN "orders" o ON o.customer_id = c.customer_id
WHERE    o.order_id IS NULL;
purposeDetect rows on the "one" side with no children — impossible with an inner join.
result0 rows here (both Ada and Alan have ordered). Delete order 3 and Alan appears.

4.4Aggregation with GROUP BY

Revenue per genre. We compute line revenue as quantity × unit_price, join through to the genre, and sum within each group. HAVING filters on the aggregate.

SELECT   g.name AS genre,
         COUNT(DISTINCT oi.order_id)        AS orders,
         SUM(oi.quantity)                    AS units,
         ROUND(SUM(oi.quantity * oi.unit_price), 2) AS revenue
FROM     order_item oi
JOIN     book  b ON b.book_id  = oi.book_id
JOIN     genre g ON g.genre_id = b.genre_id
GROUP BY g.genre_id
HAVING   revenue > 0
ORDER BY revenue DESC;
purposeCollapse line items into one summary row per genre.
result2 rows. science: 28.00 + 13.00 → revenue 41.00, 3 units; fantasy: 9.99 + 11.00 + 12.50 → 33.49, 3 units.
genreordersunitsrevenue
science1341.00
fantasy3333.49

4.5A correlated subquery

List books priced above the average price within their own genre. The inner query is correlated — it re-runs for each outer row, referencing b.genre_id.

SELECT   b.title, g.name AS genre, b.price
FROM     book b
JOIN     genre g ON g.genre_id = b.genre_id
WHERE    b.price > (
           SELECT AVG(b2.price)
           FROM   book b2
           WHERE  b2.genre_id = b.genre_id   -- correlation
         )
ORDER BY g.name, b.price DESC;
purposeCompare each row to a group-local aggregate without collapsing the rows.
resultFantasy avg ≈ 11.16 → The Fellowship (12.50). Science avg = 13.50 → Cosmos (14.00).

4.6Window functions

Rank books by price within each genre and keep every row — the thing a GROUP BY cannot do. RANK() and AVG() here are window functions partitioned by genre (Session 9).

SELECT   g.name AS genre, b.title, b.price,
         RANK()       OVER (PARTITION BY b.genre_id ORDER BY b.price DESC) AS price_rank,
         ROUND(AVG(b.price) OVER (PARTITION BY b.genre_id), 2)        AS genre_avg
FROM     book b
JOIN     genre g ON g.genre_id = b.genre_id
ORDER BY g.name, price_rank;
purposeAttach a per-group rank and per-group average to each detail row.
result5 rows. In fantasy: Fellowship rank 1 (12.50), Fifth Season 2 (11.00), Hobbit 3 (9.99), genre_avg 11.16. In science: Cosmos 1 (14.00), Demon-Haunted World 2 (13.00), genre_avg 13.50.
genretitlepriceprice_rankgenre_avg
fantasyThe Fellowship12.50111.16
fantasyThe Fifth Season11.00211.16
fantasyThe Hobbit9.99311.16
scienceCosmos14.00113.50
scienceThe Demon-Haunted World13.00213.50

4.7A CTE + running total

A common table expression (the WITH clause) names an intermediate result, which we then scan with a windowed running total of revenue over time.

WITH daily AS (
  SELECT   o.order_date,
           SUM(oi.quantity * oi.unit_price) AS day_revenue
  FROM     "orders" o
  JOIN     order_item oi ON oi.order_id = o.order_id
  GROUP BY o.order_date
)
SELECT order_date, day_revenue,
       SUM(day_revenue) OVER (ORDER BY order_date) AS running_total
FROM   daily
ORDER BY order_date;
purposeStage an aggregate (the CTE) then apply a window over it — a clean two-step pipeline.
result2024-05-01: 37.99 → 37.99; 2024-06-15: 11.00 → 48.99; 2024-06-20: 25.50 → 74.49.

4.8A view

Wrap the most-used join in a named VIEW so analysts query a friendly virtual table instead of rebuilding joins each time.

CREATE VIEW v_order_detail AS
SELECT o.order_id, o.order_date, c.name AS customer,
       b.title, oi.quantity, oi.unit_price,
       oi.quantity * oi.unit_price AS line_total
FROM   "orders" o
JOIN   customer   c  ON c.customer_id = o.customer_id
JOIN   order_item oi ON oi.order_id   = o.order_id
JOIN   book       b  ON b.book_id     = oi.book_id;

-- now a one-liner reuses all four joins:
SELECT customer, SUM(line_total) AS spent
FROM   v_order_detail
GROUP BY customer
ORDER BY spent DESC;
purposeEncapsulate a recurring 4-table join; views also act as a stable interface and a coarse access-control boundary.
resultAda Lovelace 48.99, Alan Turing 25.50.

5. Indexing & performance

Why a B-tree index turns an O(n) scan into an O(log n) probe, and how to read a query plan.

By default SQLite stores rows in a B-tree keyed on the primary key, and reads a non-indexed lookup with a full table scan — it inspects every row. A query that filters or joins on a non-key column therefore costs O(n). Adding a B-tree index on that column lets the engine binary-search to the matching rows in O(log n), exactly the contrast shown in the index-vs-scan demo and grounded in the B-tree material from Session 3.

planReading the query plan

Ask SQLite how it intends to run query 4.4 before and after indexing:

EXPLAIN QUERY PLAN
SELECT g.name, SUM(oi.quantity * oi.unit_price)
FROM   order_item oi
JOIN   book  b ON b.book_id  = oi.book_id
JOIN   genre g ON g.genre_id = b.genre_id
GROUP BY g.genre_id;

-- before indexing, the join on book.genre_id shows:
--   SCAN b            (full table scan of book)
--   SEARCH oi USING INTEGER PRIMARY KEY  (fast, it's the PK)

indexAdding the right index

-- speed up grouping/filtering by genre and the FK join
CREATE INDEX idx_book_genre ON book(genre_id);

-- composite/covering index for the common "by author" lookup
CREATE INDEX idx_ba_author ON book_author(author_id, book_id);

-- after CREATE INDEX, the same EXPLAIN now reports:
--   SEARCH b USING INDEX idx_book_genre (genre_id=?)
trade-offs Indexes are not free. Each one is a second B-tree that must be kept in sync on every INSERT / UPDATE / DELETE, costing write throughput and disk. Index the columns you filter, join, and sort on — especially foreign keys — and measure with EXPLAIN QUERY PLAN rather than guessing. A covering index that includes every column a query needs lets the engine answer from the index alone, skipping the table entirely.

6. Mapping to learning outcomes

How each step of this project lands a syllabus learning objective.

Step in this projectSessionLearning outcome
ER model & requirements (§1)S5Understand data structures; produce a database design document.
Normalisation to 3NF (§2)S4 / S7Understand data structures behind database systems; resolve many-to-many relationships.
DDL with keys & constraints (§3)S2Implement databases using SQL.
Joins & set logic (§4.1–4.3)S8Apply SQL joins and set theory to combine relations.
Aggregation, subqueries, windows, view (§4.4–4.8)S9Aggregate results; use window functions and common table expressions.
Indexing & query plans (§5)S3 / S7Assess implementation issues; reason about B-trees, partitioning and indexes.
NoSQL trade-off note (§7)S18Choose a database system appropriate for different types of projects.

7. Extensions — when to reach for NoSQL

The relational model is the right default here. Some workloads, though, point elsewhere.

This bookstore is a textbook fit for SQL: the data is highly relational, the schema is stable, and the interesting questions are joins and aggregates with strong consistency (you must not over-sell stock or double-charge a card). Where would a different model win?

rule of thumb Start relational and normalised; reach for a document, wide-column, or star-schema design only when a concrete access pattern (flexible nesting, web-scale writes, or large-scale analytics) makes the join-and-consistency guarantees of SQL the wrong trade-off.

8. References

Course materials and the demos this example builds on.