Designing and Using Databases
Bachelor in Computer Science & Artificial Intelligence (BCSAI) Β· IE University Β· a syllabus-driven outline of all 30 sessions.
Databases are central to almost all technology projects. This course provides enough knowledge to work alongside database designers and programmers in a corporate setting β making better decisions about which kind of system to use and designing a solution that fits a client's needs. It surveys the database landscape in use today, from traditional systems that drive business intelligence to advanced systems that power web-scale applications.
The course lays the foundation for working with structured databases and looks at how organizations actually use them β in Business Intelligence and enterprise systems. It has a strong practical focus: you will produce design documentation, draw diagrams, and implement databases using SQL, NoSQL and Big Data technologies (Spark). The aim is to give future leaders the skills to implement new, innovative systems.
The 30 live sessions move deliberately from theory (data models, storage internals, normalisation) to practice (a four-part SQL tutorial track, MongoDB and Spark labs, a Cassandra lab) and finally to scale (distributed and Big-Data systems). Two threads run throughout: the technical design and implementation of databases, and their organizational impact. Both compulsory texts β Petrov's Database Internals and Harrison's Next Generation Databases β are mapped onto the relevant blocks below.
Contact: mbeattie@faculty.ie.edu Β· office hours on request. The professor is a technology specialist and entrepreneur with a background in mobile payments, audio-visual software and personal-fitness start-ups, and coaches growing tech companies.
Learning objectives
The course has two central themes: the technical design & implementation of databases, and the organizational impact of these systems.
The design part focuses on understanding system requirements, building cost scenarios, and analyzing systems using Entity-Relationship diagrams and Data Flow Diagrams. The implementation part looks at how databases are used in corporate environments. By the end of the course you will be able to:
- Understand the data structures behind database systems
- Produce a database design document
- Implement databases using SQL, NoSQL and Spark
- Assess security and implementation issues
- Understand how web developers use databases
- Create simple database-backed web projects
- Understand the importance of Business Intelligence platforms
- Choose a database system appropriate for different types of projects
- Work with teams of programmers and designers and participate in decision-making
Methodology & assessment
IE's teaching method is collaborative, active and applied β a mix of lectures, discussion, in-class exercises, group work and individual study, totalling 150 hours.
Learning activities 150 h total
Estimated hours: lectures 42 Β· discussions 26 Β· exercises/async/field 26 Β· group work 38 Β· individual study 18.
Assessment weighting graded /10
A minimum of 3.5/10 (35%) in both the Midterm and Final exam is required to pass; the overall passing grade is 5.0. Below the minimum sends you to the June re-sit regardless of overall grade.
Components β deliverable & how it is evaluated
Deliverable: comprehensive written exam in Session 30 covering the whole course. Evaluated on: correctness of SQL/query answers, data-model design reasoning and conceptual understanding. Pass rule: a minimum 3.5/10 is mandatory or you go to the June re-sit.
Deliverable: the Midterm (Session 15) plus shorter in-term checks. Evaluated on: mastery of design and SQL fundamentals from Blocks 1β4. Pass rule: the Midterm also carries the 3.5/10 minimum.
Deliverable: individually-submitted exercises and assignments (e.g. SQL tutorials, design documents, lab write-ups). Evaluated on: technical accuracy, clarity of documentation and independent problem-solving.
Deliverable: the collaborative database project built over the term across SQL, NoSQL and Spark. Evaluated on: design quality, implementation completeness and teamwork.
Deliverable: the final project presentation in Session 29. Evaluated on: the end-to-end solution demoed, design decisions justified and quality of communication.
Deliverable: active, prepared engagement in lectures, discussions and in-class exercises. Evaluated on: consistency and quality of contribution across the 30 live sessions.
Program β all 30 sessions
Every session is live and in-person. Grouped into seven thematic blocks, each opening with an overview and its learning outcomes; SQL-heavy sessions cross-link the playground demos.
Foundations & Data Models
Sessions 1β2Sets the scene: why data systems matter, where they appear, and the fundamental trade-offs behind choosing one. Introduces the major data models β relational, document and graph β and the languages used to query and define them, giving you the vocabulary the rest of the course builds on.
- Explain why databases underpin modern technology projects
- Compare relational, document and graph data models
- Distinguish declarative query languages from imperative code
- Recognise the role of DDL in defining a schema
-
1
Course Introduction
Frame why databases sit at the centre of nearly every technology project and where you encounter them today.
- Learning & having fun β how the course runs: collaborative, applied, project-driven.
- Why databases are so important β they provide durable, queryable, concurrent, consistent storage that applications cannot get from flat files.
- Where we see databases today β from banking and e-commerce to social feeds, IoT and ML feature stores.
- Trade-offs β no system is best at everything; you balance consistency, availability, latency, cost and developer effort.
Key idea There is no universal database. Every choice trades something β consistency vs availability, read speed vs write speed, flexibility vs structure β so the skill is matching the system to the workload. -
2
Data Models and Query Languages
Compare the major data models and the languages used to query them.
- Relational vs document model β tables with rows/columns and joins, versus self-contained nested documents (JSON-like) that favour locality.
- Query languages for data β declarative (SQL: say what you want) vs imperative (code the how); declarative lets the optimiser pick the plan.
- Graph-like data models β nodes and edges for highly-connected data (social networks, recommendations) where joins would be expensive.
- Data Definition Language (DDL) β the SQL subset that creates and alters schema objects: tables, columns, types, keys, constraints.
Core concept Β· DDL DDL defines structure (CREATE,ALTER,DROP); DML manipulates data (INSERT,UPDATE,DELETE,SELECT).CREATE TABLE customer ( id INTEGER PRIMARY KEY, email TEXT NOT NULL UNIQUE, joined DATE DEFAULT CURRENT_DATE );
Database Design & Storage Internals
Sessions 3β5, 7The design backbone of the course. You learn how data is physically stored and indexed (hash indexes, SSTables, LSM-trees, B-trees), then how to model it well through normalisation and relationships, and finally how to turn requirements into documented logical and physical designs with DFDs and ER diagrams. Engine-level concerns β partitioning, indexing and replication β show how designs scale.
- Explain how B-trees and LSM-trees trade read vs write performance
- Normalise a schema to remove redundancy and anomalies
- Model many-to-many relationships with junction tables
- Produce logical/physical designs, DFDs and ER diagrams
- Reason about partitioning, indexing and replication for scale
-
3
Database Design I β Storage Structures
Key elements of the data structures designers must understand to reason about efficiency.
- Hash indexes β an in-memory map from key to file offset; O(1) point lookups but no range scans.
- SSTables β Sorted String Tables: immutable on-disk files kept sorted by key, enabling efficient merges and range reads.
- LSM-Trees β Log-Structured Merge trees buffer writes in memory then flush sorted runs; write-optimised (Cassandra, RocksDB).
- B-Trees β balanced, page-based trees that keep keys sorted for fast point and range queries; the default for most relational engines.
Core concept Β· indexing An index is an auxiliary structure that speeds reads at the cost of slower writes and extra space. B-trees favour balanced read/write; LSM-trees favour heavy write throughput. -
4
Database Design II β Next-level Efficiencies
Move from single-table structures to relationships and engine-level scaling concerns.
- Normalisation β organising columns/tables so each fact is stored once, removing update/insert/delete anomalies.
- Many-to-many relationships β resolved with a junction (associative) table holding two foreign keys.
- Database engines β the pluggable storage layer (e.g. InnoDB, MyISAM) that decides on-disk layout and transaction support.
- Partitioning & indexes β splitting a large table by key range/hash so each partition is smaller and faster to scan.
- Replication β copying data across nodes for availability and read scaling (leaderβfollower or multi-leader).
Core concept Β· normalisation 1NF: atomic values, no repeating groups. 2NF: no partial dependency on part of a composite key. 3NF: no transitive dependency β non-key columns depend only on the key. Higher forms reduce redundancy at the cost of more joins. -
5
Database Design III β The Final Steps
Turn requirements into logical and physical designs documented with standard diagrams.
- Logical vs physical design β logical captures entities/relationships independent of any engine; physical adds tables, types, indexes and storage choices.
- Data Flow Diagrams (DFDs) β show how data moves between processes, stores and external entities.
- Entity-Relationship (ER) diagrams β model entities, attributes and cardinalities (1:1, 1:N, M:N) before they become tables.
- SQL Tutorial I β the SELECT statement β the first hands-on SQL: projecting columns, ordering, limiting.
SELECT name, joined FROM customer ORDER BY joined DESC LIMIT 10;
-
7
Relational Data Models
Consolidate relational design and revisit how engine choices interact with the schema.
- Normalisation (revisited) β applying the normal forms to a fuller schema and weighing when to denormalise for speed.
- Many-to-many relationships β junction tables in practice, including extra attributes on the relationship itself.
- Database engines β how the chosen engine affects transactions, locking and indexing options.
- Partitioning & indexes β choosing partition keys and composite indexes to match query patterns.
Key idea Normalisation removes redundancy; denormalisation deliberately re-introduces it to avoid expensive joins. The right point on that spectrum depends on the read/write mix.
SQL Deep Dive
Sessions 6, 8β12A four-part SQL tutorial track that takes you from filtering single tables to combining them with set-theoretic joins, then to aggregation, window functions and CTEs, temporal queries, and finally getting data in and out at scale. This is the most hands-on, playground-linked block.
- Filter and categorise rows with WHERE and CASE
- Choose the correct JOIN and justify it with set theory
- Aggregate, window and modularise queries with CTEs
- Query temporal data and time-series
- Bulk-load and export data with correct encodings
-
6
SQL Tutorial II β Filtering & Categorization
Filter and shape result sets with conditional logic.
- WHERE clauses β row-level predicates with
AND/OR/NOT,IN,BETWEEN,LIKEandNULLhandling. - CASE clauses β inline conditional expressions to bucket or relabel values in the result set.
SELECT name, CASE WHEN spend > 100 THEN 'VIP' ELSE 'std' END AS tier FROM customer WHERE joined >= '2025-01-01';
- WHERE clauses β row-level predicates with
-
8
SQL Joins and Set Theory
Ground JOINs in the set theory that defines their results.
- SQL joins β INNER (matches only), LEFT/RIGHT (keep one side), FULL (keep both), CROSS (cartesian product).
- Set theory β joins and
UNION/INTERSECT/EXCEPTmap directly onto set union, intersection and difference. - De Morgan's laws & logical equivalence β Β¬(Aβ§B)=Β¬Aβ¨Β¬B; used to rewrite and simplify WHERE predicates correctly.
Core concept Β· joins An INNER JOIN returns the intersection of matching rows; a LEFT JOIN returns all left rows plus matches (NULLs where none). Picture each join as a region of a Venn diagram.SELECT c.name, o.total FROM customer c LEFT JOIN orders o ON o.customer_id = c.id;
-
9
SQL Tutorial III β Aggregation & Windows
Summarise and analyse data beyond simple row selection.
- Aggregating results β
GROUP BYwithCOUNT/SUM/AVG/MIN/MAXandHAVINGto filter groups. - Window functions β compute running totals, ranks and moving averages without collapsing rows, using
OVER (PARTITION BY β¦ ORDER BY β¦). - Common Table Expressions (CTEs) β named
WITHsubqueries that make complex queries readable and support recursion.
SELECT name, spend, RANK() OVER (ORDER BY spend DESC) AS rnk FROM customer;
- Aggregating results β
-
10
SQL Tutorial IV β Time & Time-series
Work with temporal data and the databases purpose-built for it.
- Date & time functions β truncating, extracting parts, computing intervals and handling time zones.
- Time-series analysis in SQL β bucketing by interval, gap-filling and rolling aggregates over ordered timestamps.
- Time-series databases β engines (e.g. InfluxDB, TimescaleDB) optimised for append-heavy, timestamp-keyed data.
Key idea Time-series workloads are append-mostly and queried by range, so they favour storage sorted by time and downsampling/retention policies over general-purpose indexing. -
11
Advanced SQL Topics
Bridge SQL to analytics and application workflows.
- Exploratory Data Analysis β profiling distributions, nulls and outliers with SQL before modelling.
- Machine Learning datasets β shaping query output into feature tables for training.
- Loading from a database to an application β cursors, connection pooling and parameterised queries from app code.
Key idea The database is increasingly the feature store for ML β clean, well-modelled SQL output is what feeds models, so query quality is data quality. -
12
Data Outside the Database
Get data in and out at scale and handle file formats correctly.
- Storing & modifying data β transactional
INSERT/UPDATE/DELETEvs bulk paths, and when each is appropriate. - Bulk data load operations β tools like
COPY/LOAD DATAthat bypass per-row overhead for large imports. - SQLDump, file types & encodings β logical dumps for backup/migration; CSV/JSON/Parquet trade-offs; UTF-8 and delimiter pitfalls.
Key idea Most real-world data pain is at the boundary β encodings, delimiters and types. Getting import/export right prevents silent corruption. - Storing & modifying data β transactional
Data Warehousing, Web Apps & Business Intelligence
Sessions 13β17Shifts from transactional (OLTP) thinking to analytical (OLAP): dimensional modelling with facts and dimensions, star vs snowflake schemas, and the place of databases in data-engineering pipelines. You then connect databases to dynamic web sites and build a Business-Intelligence data application, with the Midterm sitting at the block's centre.
- Design star and snowflake schemas from facts and dimensions
- Place databases within an enterprise data pipeline
- Connect a database to a dynamic, database-driven web site
- Explain the role of BI in enterprise IT and build a BI app
-
13
Data Warehouse and Data Mart Design
Design analytical schemas and place databases within engineering pipelines.
- Facts & dimensions β facts hold measurable events (sales, clicks); dimensions hold the descriptive context (time, product, customer).
- Star vs snowflake schema β star keeps dimensions denormalised in one table each (fast, simple); snowflake normalises them (less redundancy, more joins).
- Enterprise data architectures β warehouses, data marts, lakes and the patterns that connect them.
- Databases in data-engineering pipelines β where ETL/ELT sources and lands data.
Core concept Β· OLTP vs OLAP OLTP is many small read/write transactions (normalised, app-facing); OLAP is large analytical scans (dimensional, report-facing). Warehouses are OLAP-shaped. -
14
Database Applications I β Web Applications
Connect databases to dynamic, database-driven web sites.
- Dynamic, database-driven web sites β how a request becomes a query and a query becomes a rendered page.
- Common stacks β LAMP (Linux/Apache/MySQL/PHP) and Flask-based Python stacks; where the database sits in each.
Key idea The web app is just a thin layer over the database: most user actions map to a CRUD operation, so a clean schema makes the app simpler. -
15
Midterm Exam
Comprehensive assessment of design and SQL fundamentals from Blocks 1β4. A minimum 3.5/10 is required to pass the subject.
-
16
Database Applications II β Business Intelligence
Understand where BI fits in enterprise IT and start building a data app.
- Role of BI in enterprise IT β turning warehoused data into dashboards and decisions for non-technical users.
- Demo of BI platforms β a tour of how tools sit on top of the warehouse.
- Building a data application with Dash β a Python framework for interactive, database-backed analytics apps.
-
17
Database Applications III β Business Intelligence (cont.)
Continue the BI application and reflect on top-down vs bottom-up data design.
- Building a data application with Dash β extending the app with interactivity and live queries.
- Designing data: top-down vs bottom-up β starting from business questions (top-down) vs from available source data (bottom-up).
NoSQL & Document Databases
Sessions 18β21Introduces the NoSQL family and focuses on document databases (MongoDB). You learn document operations, partitioning/sharding, the MongoDB Query Language, and how to prototype a full document-backed web app on the MEAN stack β contrasting flexible schemas against the relational rigour of Blocks 2β3.
- Explain when NoSQL beats relational and at what cost
- Perform CRUD and queries on document collections (MQL)
- Reason about partitioning and sharding for scale
- Prototype a Node.js + MongoDB web application
-
18
Introduction to NoSQL
Survey the NoSQL landscape and its core data structures.
- What is NoSQL & when to use it β non-relational stores (key-value, document, column-family, graph) that trade strict schema/joins for scale and flexibility.
- Relevant NoSQL data structures β a sample implementation of the most common structures and their access patterns.
Core concept Β· CAP theorem A distributed store can guarantee at most two of Consistency, Availability, Partition-tolerance. Since network partitions are unavoidable, real systems choose between consistency and availability under partition. -
19
Document Databases
Work with document collections and the basics of MongoDB.
- Operations on document collections β insert/find/update/delete over BSON documents; querying nested fields and arrays.
- Partitioning & sharding β splitting a collection across nodes by shard key to scale writes and storage horizontally.
- Query processing using JavaScript β aggregation pipelines and expressions evaluated in the engine.
- MongoDB & PyMongo β driving MongoDB from Python.
// orders over 100, newest first db.orders.find( { total: { $gt: 100 } } ).sort({ created: -1 }) -
20
MongoDB Lab 1
Hands-on lab answering real questions against real datasets.
- Answering questions with MQL β writing MongoDB Query Language queries (find, filter, project, aggregate) on real-world datasets.
Lab outcome Translate analytical questions into MQL aggregation pipelines and interpret the results β the document-store analogue of the SQL tutorials. -
21
Designing Web-Apps with MongoDB
Prototype a full document-backed web application.
- Prototyping with Node.js & MongoDB β wiring a JavaScript server to a document store end-to-end.
- The MEAN stack β MongoDB Β· Express Β· Angular Β· Node.js β an all-JavaScript web stack.
Key idea A document model lets the same JSON shape flow from database to API to front-end, which is why all-JS stacks like MEAN pair naturally with MongoDB.
Big Data & Distributed Processing with Spark
Sessions 22β26Moves to web-scale processing: the Hadoop ecosystem and MapReduce, then Apache Spark's architecture, cluster tuning, and batch-processing labs, finishing with SparkSQL and Spark ML. The theme is distributing computation across a cluster rather than scaling a single machine.
- Explain MapReduce and massively-parallel processing
- Describe Spark's architecture and tune a cluster
- Run Spark batch jobs to transform large datasets
- Combine SQL and ML on Spark (SparkSQL, Spark ML)
-
22
Introduction to Big Data & the Hadoop Ecosystem
Understand distributed processing paradigms and the foundations of Hadoop.
- Big Data technologies & frameworks β the "3 Vs" (volume, velocity, variety) and the tools built to handle them.
- Distributed & massively-parallel processing β splitting work across many commodity machines instead of one large server.
- Hadoop & MapReduce β HDFS for distributed storage; MapReduce for distributed computation.
Core concept Β· MapReduce A two-phase model: Map transforms each input record into key-value pairs; Reduce aggregates all values sharing a key. The framework handles distribution, shuffling and fault tolerance. -
23
Spark Basics
Learn Spark's architecture and how to configure and tune a cluster.
- Spark architecture & concepts β driver, executors and RDDs/DataFrames; lazy evaluation with a DAG of transformations.
- Cluster configuration & tuning β executor memory/cores, partitions and shuffle settings.
- Basic Spark usage demo β loading and transforming data interactively.
Key idea Spark keeps data in memory across stages and only computes when an action is called (lazy evaluation), making it far faster than disk-bound MapReduce for iterative work. -
24
Spark Batch Processing β Lab I
Apply Spark batch-processing functions in a hands-on lab.
- Batch processing functions β map, filter, groupBy, join and aggregate over distributed DataFrames.
-
25
Spark Batch Processing β Lab II
Continue and extend the Spark batch-processing exercises.
- Batch processing functions (cont.) β more complex pipelines, joins across datasets and performance considerations.
-
26
SparkSQL and Spark ML
Combine SQL and machine learning on Spark and review the latest trends.
- SparkSQL β running SQL queries over distributed DataFrames with the Catalyst optimiser.
- Spark ML β distributed machine-learning pipelines (features, models, evaluation) at scale.
- Latest trends & top tips β where the ecosystem is heading and practical advice.
Key idea SparkSQL lets you express distributed computation in familiar SQL while the optimiser plans execution across the cluster β bridging the SQL of Block 3 with Big-Data scale.
Distributed Databases & Final Assessment
Sessions 27β30Caps the course with a wide-column distributed database (Cassandra): its ring architecture, tunable replication, CQL, and probabilistic structures like Bloom filters. The block closes with the final project presentations and the comprehensive final exam.
- Describe Cassandra's ring, tokens and replication model
- Model and query data with CQL and collections
- Explain Bloom filters and tunable consistency
- Present and defend an end-to-end database solution
-
27
Distributed Databases: Cassandra
Master Cassandra's data model, distributed architecture and query language.
- Cassandra data model β wide-column, query-first design: you model tables around the queries you need, not normal forms.
- Distributed architecture β masterless peer-to-peer cluster with no single point of failure.
- Nodes, rings, tokens, replication β data is hashed onto a token ring; the replication factor decides how many nodes hold each row.
- CQL β a SQL-like query language for Cassandra (no joins; partition-key driven).
- Data types & collections β scalars plus set/list/map collection columns.
- Secondary indexes β querying on non-key columns, with their performance caveats.
- Materialised & probabilistic structures Β· Bloom filters β a Bloom filter is a space-efficient set test: "definitely not present" or "possibly present", used to skip disk reads.
Core concept Β· tunable consistency Cassandra trades CAP per query: you pick a consistency level (e.g.ONE,QUORUM,ALL) so each read/write balances latency against how many replicas must agree. -
28
Cassandra Lab
Hands-on lab applying the Cassandra concepts from Session 27.
- Cassandra Lab β creating keyspaces/tables, writing CQL, and observing replication and consistency in practice.
-
29
Final Project Presentations
Present the group project, demonstrating an end-to-end database solution and justifying the design choices made across SQL, NoSQL and Spark.
-
30
Final Exam
Comprehensive final assessment (30% of the grade). A minimum 3.5/10 is required; below it sends you to the June re-sit regardless of overall grade.
Key concepts β glossary
A quick reference to the core ideas that recur across the course, from storage internals to distributed systems.
CREATE, ALTER); Data Manipulation Language reads and changes data (SELECT, INSERT, UPDATE).UNION, INTERSECT and EXCEPT combine result sets as set union, intersection and difference.GROUP BY with functions like SUM/COUNT/AVG, filtered by HAVING, to summarise many rows into groups.OVER a partition of rows (ranks, running totals) without collapsing them.WITH subquery that improves readability and enables recursion.QUORUM) to balance latency against how many replicas must agree (Cassandra).Bibliography
Compulsory reading β both titles are available digitally and are mapped to the blocks where they are most relevant.
- Alex Petrov. Database Internals. O'Reilly. ISBN 978149204034 (Digital) The reference for storage engines and the single-node internals at the heart of Blocks 1β2: B-trees, LSM-trees/SSTables, hash indexes, transactions and how engines store and retrieve data on disk. β Sessions 2β7
- Guy Harrison. Next Generation Databases: NoSQL, NewSQL, and Big Data. Apress. ISBN 9781484213292 (Digital) The companion for the distributed and non-relational half of the course: NoSQL and document stores, Big-Data frameworks and the architecture of systems like Cassandra. β Sessions 18β19, 22, 27
Behaviour, attendance and ethics are governed by the University's Code of Conduct, Attendance Policy and Ethics Code; the Program Director may provide further indications.