Designing & Using Databases course outline

Designing and Using Databases

Bachelor in Computer Science & Artificial Intelligence (BCSAI) Β· IE University Β· a syllabus-driven outline of all 30 sessions.

Databases are central to almost all technology projects. This course provides enough knowledge to work alongside database designers and programmers in a corporate setting β€” making better decisions about which kind of system to use and designing a solution that fits a client's needs. It surveys the database landscape in use today, from traditional systems that drive business intelligence to advanced systems that power web-scale applications.

The course lays the foundation for working with structured databases and looks at how organizations actually use them β€” in Business Intelligence and enterprise systems. It has a strong practical focus: you will produce design documentation, draw diagrams, and implement databases using SQL, NoSQL and Big Data technologies (Spark). The aim is to give future leaders the skills to implement new, innovative systems.

The 30 live sessions move deliberately from theory (data models, storage internals, normalisation) to practice (a four-part SQL tutorial track, MongoDB and Spark labs, a Cassandra lab) and finally to scale (distributed and Big-Data systems). Two threads run throughout: the technical design and implementation of databases, and their organizational impact. Both compulsory texts β€” Petrov's Database Internals and Harrison's Next Generation Databases β€” are mapped onto the relevant blocks below.

Program
BCSAI
Course code
DUD-CSAI.2.M.A
Area
Computer Science
Sessions
30
Credits
6.0 ECTS
Workload
150 h
Academic year
25–26
Degree course
Second
Semester
2ΒΊ
Category
Compulsory
Language
English
Professor
Mike Beattie

Contact: mbeattie@faculty.ie.edu Β· office hours on request. The professor is a technology specialist and entrepreneur with a background in mobile payments, audio-visual software and personal-fitness start-ups, and coaches growing tech companies.

This is the structured companion to the in-browser SQL playground β€” the demos there (schema viewer, JOIN Venn, index-vs-scan, and a real SQLite-in-WASM query editor) line up with the SQL sessions below. Look for the playground tag on a session to jump in.

Learning objectives

The course has two central themes: the technical design & implementation of databases, and the organizational impact of these systems.

The design part focuses on understanding system requirements, building cost scenarios, and analyzing systems using Entity-Relationship diagrams and Data Flow Diagrams. The implementation part looks at how databases are used in corporate environments. By the end of the course you will be able to:

Methodology & assessment

IE's teaching method is collaborative, active and applied β€” a mix of lectures, discussion, in-class exercises, group work and individual study, totalling 150 hours.

Learning activities 150 h total

Lectures28.0%
Group work25.3%
Discussions17.3%
Exercises / async / field17.3%
Individual studying12.0%

Estimated hours: lectures 42 Β· discussions 26 Β· exercises/async/field 26 Β· group work 38 Β· individual study 18.

Assessment weighting graded /10

Final Exam30%
Individual work20%
Intermediate tests20%
Group Presentation10%
Group Work10%
Class Participation10%

A minimum of 3.5/10 (35%) in both the Midterm and Final exam is required to pass; the overall passing grade is 5.0. Below the minimum sends you to the June re-sit regardless of overall grade.

Components β€” deliverable & how it is evaluated

Final Exam30%

Deliverable: comprehensive written exam in Session 30 covering the whole course. Evaluated on: correctness of SQL/query answers, data-model design reasoning and conceptual understanding. Pass rule: a minimum 3.5/10 is mandatory or you go to the June re-sit.

Intermediate tests20%

Deliverable: the Midterm (Session 15) plus shorter in-term checks. Evaluated on: mastery of design and SQL fundamentals from Blocks 1–4. Pass rule: the Midterm also carries the 3.5/10 minimum.

Individual work20%

Deliverable: individually-submitted exercises and assignments (e.g. SQL tutorials, design documents, lab write-ups). Evaluated on: technical accuracy, clarity of documentation and independent problem-solving.

Group Work10%

Deliverable: the collaborative database project built over the term across SQL, NoSQL and Spark. Evaluated on: design quality, implementation completeness and teamwork.

Group Presentation10%

Deliverable: the final project presentation in Session 29. Evaluated on: the end-to-end solution demoed, design decisions justified and quality of communication.

Class Participation10%

Deliverable: active, prepared engagement in lectures, discussions and in-class exercises. Evaluated on: consistency and quality of contribution across the 30 live sessions.

Pass, attendance & re-sit rules. The overall passing grade is 5.0/10, with a mandatory 3.5/10 minimum in both the Midterm and the Final exam. Students must meet the 80% attendance rule β€” falling below it fails both the ordinary and extraordinary calls for the year and forces a re-enrolment. Each student has four exam chances over two academic years. The June/July re-sit is a single comprehensive exam (physical presence required at the enrolled campus, no date/format changes), graded only on that exam and capped at 8.0/10. Grade appeals require attending the post-exam review session first.
AI policy. Generative AI is encouraged to build an informed, critical perspective β€” but minimum-effort prompts give low-quality results, and you must cross-check and acknowledge any AI use. Unacknowledged use violates academic-honesty policies; the syllabus provides a suggested acknowledgement format.

Program β€” all 30 sessions

Every session is live and in-person. Grouped into seven thematic blocks, each opening with an overview and its learning outcomes; SQL-heavy sessions cross-link the playground demos.

Block 1

Foundations & Data Models

Sessions 1–2

Sets the scene: why data systems matter, where they appear, and the fundamental trade-offs behind choosing one. Introduces the major data models β€” relational, document and graph β€” and the languages used to query and define them, giving you the vocabulary the rest of the course builds on.

Learning outcomes
  • Explain why databases underpin modern technology projects
  • Compare relational, document and graph data models
  • Distinguish declarative query languages from imperative code
  • Recognise the role of DDL in defining a schema
  • 1

    Course Introduction

    Frame why databases sit at the centre of nearly every technology project and where you encounter them today.

    • Learning & having fun β€” how the course runs: collaborative, applied, project-driven.
    • Why databases are so important β€” they provide durable, queryable, concurrent, consistent storage that applications cannot get from flat files.
    • Where we see databases today β€” from banking and e-commerce to social feeds, IoT and ML feature stores.
    • Trade-offs β€” no system is best at everything; you balance consistency, availability, latency, cost and developer effort.
    Key idea There is no universal database. Every choice trades something β€” consistency vs availability, read speed vs write speed, flexibility vs structure β€” so the skill is matching the system to the workload.
  • 2

    Data Models and Query Languages

    Compare the major data models and the languages used to query them.

    • Relational vs document model β€” tables with rows/columns and joins, versus self-contained nested documents (JSON-like) that favour locality.
    • Query languages for data β€” declarative (SQL: say what you want) vs imperative (code the how); declarative lets the optimiser pick the plan.
    • Graph-like data models β€” nodes and edges for highly-connected data (social networks, recommendations) where joins would be expensive.
    • Data Definition Language (DDL) β€” the SQL subset that creates and alters schema objects: tables, columns, types, keys, constraints.
    Core concept Β· DDL DDL defines structure (CREATE, ALTER, DROP); DML manipulates data (INSERT, UPDATE, DELETE, SELECT).
    CREATE TABLE customer (
      id     INTEGER PRIMARY KEY,
      email  TEXT NOT NULL UNIQUE,
      joined DATE DEFAULT CURRENT_DATE
    );
    Database Internals β€” Ch. 1
Block 2

Database Design & Storage Internals

Sessions 3–5, 7

The design backbone of the course. You learn how data is physically stored and indexed (hash indexes, SSTables, LSM-trees, B-trees), then how to model it well through normalisation and relationships, and finally how to turn requirements into documented logical and physical designs with DFDs and ER diagrams. Engine-level concerns β€” partitioning, indexing and replication β€” show how designs scale.

Learning outcomes
  • Explain how B-trees and LSM-trees trade read vs write performance
  • Normalise a schema to remove redundancy and anomalies
  • Model many-to-many relationships with junction tables
  • Produce logical/physical designs, DFDs and ER diagrams
  • Reason about partitioning, indexing and replication for scale
  • 3

    Database Design I β€” Storage Structures

    Key elements of the data structures designers must understand to reason about efficiency.

    • Hash indexes β€” an in-memory map from key to file offset; O(1) point lookups but no range scans.
    • SSTables β€” Sorted String Tables: immutable on-disk files kept sorted by key, enabling efficient merges and range reads.
    • LSM-Trees β€” Log-Structured Merge trees buffer writes in memory then flush sorted runs; write-optimised (Cassandra, RocksDB).
    • B-Trees β€” balanced, page-based trees that keep keys sorted for fast point and range queries; the default for most relational engines.
    Core concept Β· indexing An index is an auxiliary structure that speeds reads at the cost of slower writes and extra space. B-trees favour balanced read/write; LSM-trees favour heavy write throughput.
    Database Internals β€” Pt. Iplayground Β· index lookup
  • 4

    Database Design II β€” Next-level Efficiencies

    Move from single-table structures to relationships and engine-level scaling concerns.

    • Normalisation β€” organising columns/tables so each fact is stored once, removing update/insert/delete anomalies.
    • Many-to-many relationships β€” resolved with a junction (associative) table holding two foreign keys.
    • Database engines β€” the pluggable storage layer (e.g. InnoDB, MyISAM) that decides on-disk layout and transaction support.
    • Partitioning & indexes β€” splitting a large table by key range/hash so each partition is smaller and faster to scan.
    • Replication β€” copying data across nodes for availability and read scaling (leader–follower or multi-leader).
    Core concept Β· normalisation 1NF: atomic values, no repeating groups. 2NF: no partial dependency on part of a composite key. 3NF: no transitive dependency β€” non-key columns depend only on the key. Higher forms reduce redundancy at the cost of more joins.
    Database Internals β€” Pt. IIplayground Β· schema
  • 5

    Database Design III β€” The Final Steps

    Turn requirements into logical and physical designs documented with standard diagrams.

    • Logical vs physical design β€” logical captures entities/relationships independent of any engine; physical adds tables, types, indexes and storage choices.
    • Data Flow Diagrams (DFDs) β€” show how data moves between processes, stores and external entities.
    • Entity-Relationship (ER) diagrams β€” model entities, attributes and cardinalities (1:1, 1:N, M:N) before they become tables.
    • SQL Tutorial I β€” the SELECT statement β€” the first hands-on SQL: projecting columns, ordering, limiting.
    SELECT name, joined
    FROM customer
    ORDER BY joined DESC
    LIMIT 10;
    playground Β· query editor
  • 7

    Relational Data Models

    Consolidate relational design and revisit how engine choices interact with the schema.

    • Normalisation (revisited) β€” applying the normal forms to a fuller schema and weighing when to denormalise for speed.
    • Many-to-many relationships β€” junction tables in practice, including extra attributes on the relationship itself.
    • Database engines β€” how the chosen engine affects transactions, locking and indexing options.
    • Partitioning & indexes β€” choosing partition keys and composite indexes to match query patterns.
    Key idea Normalisation removes redundancy; denormalisation deliberately re-introduces it to avoid expensive joins. The right point on that spectrum depends on the read/write mix.
    playground Β· schema
Block 3

SQL Deep Dive

Sessions 6, 8–12

A four-part SQL tutorial track that takes you from filtering single tables to combining them with set-theoretic joins, then to aggregation, window functions and CTEs, temporal queries, and finally getting data in and out at scale. This is the most hands-on, playground-linked block.

Learning outcomes
  • Filter and categorise rows with WHERE and CASE
  • Choose the correct JOIN and justify it with set theory
  • Aggregate, window and modularise queries with CTEs
  • Query temporal data and time-series
  • Bulk-load and export data with correct encodings
  • 6

    SQL Tutorial II β€” Filtering & Categorization

    Filter and shape result sets with conditional logic.

    • WHERE clauses β€” row-level predicates with AND/OR/NOT, IN, BETWEEN, LIKE and NULL handling.
    • CASE clauses β€” inline conditional expressions to bucket or relabel values in the result set.
    SELECT name,
      CASE WHEN spend > 100 THEN 'VIP' ELSE 'std' END AS tier
    FROM customer
    WHERE joined >= '2025-01-01';
    playground Β· query editor
  • 8

    SQL Joins and Set Theory

    Ground JOINs in the set theory that defines their results.

    • SQL joins β€” INNER (matches only), LEFT/RIGHT (keep one side), FULL (keep both), CROSS (cartesian product).
    • Set theory β€” joins and UNION/INTERSECT/EXCEPT map directly onto set union, intersection and difference.
    • De Morgan's laws & logical equivalence β€” Β¬(A∧B)=Β¬A∨¬B; used to rewrite and simplify WHERE predicates correctly.
    Core concept Β· joins An INNER JOIN returns the intersection of matching rows; a LEFT JOIN returns all left rows plus matches (NULLs where none). Picture each join as a region of a Venn diagram.
    SELECT c.name, o.total
    FROM customer c
    LEFT JOIN orders o ON o.customer_id = c.id;
    playground Β· JOIN Venn
  • 9

    SQL Tutorial III β€” Aggregation & Windows

    Summarise and analyse data beyond simple row selection.

    • Aggregating results β€” GROUP BY with COUNT/SUM/AVG/MIN/MAX and HAVING to filter groups.
    • Window functions β€” compute running totals, ranks and moving averages without collapsing rows, using OVER (PARTITION BY … ORDER BY …).
    • Common Table Expressions (CTEs) β€” named WITH subqueries that make complex queries readable and support recursion.
    SELECT name, spend,
      RANK() OVER (ORDER BY spend DESC) AS rnk
    FROM customer;
    playground Β· query editor
  • 10

    SQL Tutorial IV β€” Time & Time-series

    Work with temporal data and the databases purpose-built for it.

    • Date & time functions β€” truncating, extracting parts, computing intervals and handling time zones.
    • Time-series analysis in SQL β€” bucketing by interval, gap-filling and rolling aggregates over ordered timestamps.
    • Time-series databases β€” engines (e.g. InfluxDB, TimescaleDB) optimised for append-heavy, timestamp-keyed data.
    Key idea Time-series workloads are append-mostly and queried by range, so they favour storage sorted by time and downsampling/retention policies over general-purpose indexing.
    playground Β· query editor
  • 11

    Advanced SQL Topics

    Bridge SQL to analytics and application workflows.

    • Exploratory Data Analysis β€” profiling distributions, nulls and outliers with SQL before modelling.
    • Machine Learning datasets β€” shaping query output into feature tables for training.
    • Loading from a database to an application β€” cursors, connection pooling and parameterised queries from app code.
    Key idea The database is increasingly the feature store for ML β€” clean, well-modelled SQL output is what feeds models, so query quality is data quality.
  • 12

    Data Outside the Database

    Get data in and out at scale and handle file formats correctly.

    • Storing & modifying data β€” transactional INSERT/UPDATE/DELETE vs bulk paths, and when each is appropriate.
    • Bulk data load operations β€” tools like COPY/LOAD DATA that bypass per-row overhead for large imports.
    • SQLDump, file types & encodings β€” logical dumps for backup/migration; CSV/JSON/Parquet trade-offs; UTF-8 and delimiter pitfalls.
    Key idea Most real-world data pain is at the boundary β€” encodings, delimiters and types. Getting import/export right prevents silent corruption.
Block 4

Data Warehousing, Web Apps & Business Intelligence

Sessions 13–17

Shifts from transactional (OLTP) thinking to analytical (OLAP): dimensional modelling with facts and dimensions, star vs snowflake schemas, and the place of databases in data-engineering pipelines. You then connect databases to dynamic web sites and build a Business-Intelligence data application, with the Midterm sitting at the block's centre.

Learning outcomes
  • Design star and snowflake schemas from facts and dimensions
  • Place databases within an enterprise data pipeline
  • Connect a database to a dynamic, database-driven web site
  • Explain the role of BI in enterprise IT and build a BI app
  • 13

    Data Warehouse and Data Mart Design

    Design analytical schemas and place databases within engineering pipelines.

    • Facts & dimensions β€” facts hold measurable events (sales, clicks); dimensions hold the descriptive context (time, product, customer).
    • Star vs snowflake schema β€” star keeps dimensions denormalised in one table each (fast, simple); snowflake normalises them (less redundancy, more joins).
    • Enterprise data architectures β€” warehouses, data marts, lakes and the patterns that connect them.
    • Databases in data-engineering pipelines β€” where ETL/ELT sources and lands data.
    Core concept Β· OLTP vs OLAP OLTP is many small read/write transactions (normalised, app-facing); OLAP is large analytical scans (dimensional, report-facing). Warehouses are OLAP-shaped.
  • 14

    Database Applications I β€” Web Applications

    Connect databases to dynamic, database-driven web sites.

    • Dynamic, database-driven web sites β€” how a request becomes a query and a query becomes a rendered page.
    • Common stacks β€” LAMP (Linux/Apache/MySQL/PHP) and Flask-based Python stacks; where the database sits in each.
    Key idea The web app is just a thin layer over the database: most user actions map to a CRUD operation, so a clean schema makes the app simpler.
  • 15

    Midterm Exam

    Comprehensive assessment of design and SQL fundamentals from Blocks 1–4. A minimum 3.5/10 is required to pass the subject.

    assessment Β· exam Β· 3.5/10 min
  • 16

    Database Applications II β€” Business Intelligence

    Understand where BI fits in enterprise IT and start building a data app.

    • Role of BI in enterprise IT β€” turning warehoused data into dashboards and decisions for non-technical users.
    • Demo of BI platforms β€” a tour of how tools sit on top of the warehouse.
    • Building a data application with Dash β€” a Python framework for interactive, database-backed analytics apps.
    demo Β· BI platform
  • 17

    Database Applications III β€” Business Intelligence (cont.)

    Continue the BI application and reflect on top-down vs bottom-up data design.

    • Building a data application with Dash β€” extending the app with interactivity and live queries.
    • Designing data: top-down vs bottom-up β€” starting from business questions (top-down) vs from available source data (bottom-up).
Block 5

NoSQL & Document Databases

Sessions 18–21

Introduces the NoSQL family and focuses on document databases (MongoDB). You learn document operations, partitioning/sharding, the MongoDB Query Language, and how to prototype a full document-backed web app on the MEAN stack β€” contrasting flexible schemas against the relational rigour of Blocks 2–3.

Learning outcomes
  • Explain when NoSQL beats relational and at what cost
  • Perform CRUD and queries on document collections (MQL)
  • Reason about partitioning and sharding for scale
  • Prototype a Node.js + MongoDB web application
  • 18

    Introduction to NoSQL

    Survey the NoSQL landscape and its core data structures.

    • What is NoSQL & when to use it β€” non-relational stores (key-value, document, column-family, graph) that trade strict schema/joins for scale and flexibility.
    • Relevant NoSQL data structures β€” a sample implementation of the most common structures and their access patterns.
    Core concept Β· CAP theorem A distributed store can guarantee at most two of Consistency, Availability, Partition-tolerance. Since network partitions are unavoidable, real systems choose between consistency and availability under partition.
    Next Generation Databases β€” Pt. I
  • 19

    Document Databases

    Work with document collections and the basics of MongoDB.

    • Operations on document collections β€” insert/find/update/delete over BSON documents; querying nested fields and arrays.
    • Partitioning & sharding β€” splitting a collection across nodes by shard key to scale writes and storage horizontally.
    • Query processing using JavaScript β€” aggregation pipelines and expressions evaluated in the engine.
    • MongoDB & PyMongo β€” driving MongoDB from Python.
    // orders over 100, newest first
    db.orders.find(
      { total: { $gt: 100 } }
    ).sort({ created: -1 })
    Next Generation Databases β€” Pt. II
  • 20

    MongoDB Lab 1

    Hands-on lab answering real questions against real datasets.

    • Answering questions with MQL β€” writing MongoDB Query Language queries (find, filter, project, aggregate) on real-world datasets.
    Lab outcome Translate analytical questions into MQL aggregation pipelines and interpret the results β€” the document-store analogue of the SQL tutorials.
    lab Β· MongoDB
  • 21

    Designing Web-Apps with MongoDB

    Prototype a full document-backed web application.

    • Prototyping with Node.js & MongoDB β€” wiring a JavaScript server to a document store end-to-end.
    • The MEAN stack β€” MongoDB Β· Express Β· Angular Β· Node.js β€” an all-JavaScript web stack.
    Key idea A document model lets the same JSON shape flow from database to API to front-end, which is why all-JS stacks like MEAN pair naturally with MongoDB.
Block 6

Big Data & Distributed Processing with Spark

Sessions 22–26

Moves to web-scale processing: the Hadoop ecosystem and MapReduce, then Apache Spark's architecture, cluster tuning, and batch-processing labs, finishing with SparkSQL and Spark ML. The theme is distributing computation across a cluster rather than scaling a single machine.

Learning outcomes
  • Explain MapReduce and massively-parallel processing
  • Describe Spark's architecture and tune a cluster
  • Run Spark batch jobs to transform large datasets
  • Combine SQL and ML on Spark (SparkSQL, Spark ML)
  • 22

    Introduction to Big Data & the Hadoop Ecosystem

    Understand distributed processing paradigms and the foundations of Hadoop.

    • Big Data technologies & frameworks β€” the "3 Vs" (volume, velocity, variety) and the tools built to handle them.
    • Distributed & massively-parallel processing β€” splitting work across many commodity machines instead of one large server.
    • Hadoop & MapReduce β€” HDFS for distributed storage; MapReduce for distributed computation.
    Core concept Β· MapReduce A two-phase model: Map transforms each input record into key-value pairs; Reduce aggregates all values sharing a key. The framework handles distribution, shuffling and fault tolerance.
    Next Generation Databases β€” Big Data
  • 23

    Spark Basics

    Learn Spark's architecture and how to configure and tune a cluster.

    • Spark architecture & concepts β€” driver, executors and RDDs/DataFrames; lazy evaluation with a DAG of transformations.
    • Cluster configuration & tuning β€” executor memory/cores, partitions and shuffle settings.
    • Basic Spark usage demo β€” loading and transforming data interactively.
    Key idea Spark keeps data in memory across stages and only computes when an action is called (lazy evaluation), making it far faster than disk-bound MapReduce for iterative work.
  • 24

    Spark Batch Processing β€” Lab I

    Apply Spark batch-processing functions in a hands-on lab.

    • Batch processing functions β€” map, filter, groupBy, join and aggregate over distributed DataFrames.
    lab Β· Spark
  • 25

    Spark Batch Processing β€” Lab II

    Continue and extend the Spark batch-processing exercises.

    • Batch processing functions (cont.) β€” more complex pipelines, joins across datasets and performance considerations.
    lab Β· Spark
  • 26

    SparkSQL and Spark ML

    Combine SQL and machine learning on Spark and review the latest trends.

    • SparkSQL β€” running SQL queries over distributed DataFrames with the Catalyst optimiser.
    • Spark ML β€” distributed machine-learning pipelines (features, models, evaluation) at scale.
    • Latest trends & top tips β€” where the ecosystem is heading and practical advice.
    Key idea SparkSQL lets you express distributed computation in familiar SQL while the optimiser plans execution across the cluster β€” bridging the SQL of Block 3 with Big-Data scale.
Block 7

Distributed Databases & Final Assessment

Sessions 27–30

Caps the course with a wide-column distributed database (Cassandra): its ring architecture, tunable replication, CQL, and probabilistic structures like Bloom filters. The block closes with the final project presentations and the comprehensive final exam.

Learning outcomes
  • Describe Cassandra's ring, tokens and replication model
  • Model and query data with CQL and collections
  • Explain Bloom filters and tunable consistency
  • Present and defend an end-to-end database solution
  • 27

    Distributed Databases: Cassandra

    Master Cassandra's data model, distributed architecture and query language.

    • Cassandra data model β€” wide-column, query-first design: you model tables around the queries you need, not normal forms.
    • Distributed architecture β€” masterless peer-to-peer cluster with no single point of failure.
    • Nodes, rings, tokens, replication β€” data is hashed onto a token ring; the replication factor decides how many nodes hold each row.
    • CQL β€” a SQL-like query language for Cassandra (no joins; partition-key driven).
    • Data types & collections β€” scalars plus set/list/map collection columns.
    • Secondary indexes β€” querying on non-key columns, with their performance caveats.
    • Materialised & probabilistic structures Β· Bloom filters β€” a Bloom filter is a space-efficient set test: "definitely not present" or "possibly present", used to skip disk reads.
    Core concept Β· tunable consistency Cassandra trades CAP per query: you pick a consistency level (e.g. ONE, QUORUM, ALL) so each read/write balances latency against how many replicas must agree.
    Next Generation Databases β€” Cassandra
  • 28

    Cassandra Lab

    Hands-on lab applying the Cassandra concepts from Session 27.

    • Cassandra Lab β€” creating keyspaces/tables, writing CQL, and observing replication and consistency in practice.
    lab Β· Cassandra
  • 29

    Final Project Presentations

    Present the group project, demonstrating an end-to-end database solution and justifying the design choices made across SQL, NoSQL and Spark.

    assessment Β· presentation Β· 10%
  • 30

    Final Exam

    Comprehensive final assessment (30% of the grade). A minimum 3.5/10 is required; below it sends you to the June re-sit regardless of overall grade.

    assessment Β· exam Β· 30% Β· 3.5/10 min

Key concepts β€” glossary

A quick reference to the core ideas that recur across the course, from storage internals to distributed systems.

Data model S2
The abstract shape of data β€” relational (tables), document (nested records), graph (nodes/edges) β€” that determines how it is stored and queried.
DDL / DML S2
Data Definition Language defines schema objects (CREATE, ALTER); Data Manipulation Language reads and changes data (SELECT, INSERT, UPDATE).
Index S3
An auxiliary structure (often a B-tree) that speeds lookups at the cost of slower writes and extra storage.
B-tree S3
A balanced, page-oriented tree keeping keys sorted for fast point and range queries; the default index in relational engines.
LSM-tree & SSTable S3
A write-optimised design that buffers writes in memory and flushes immutable sorted files (SSTables), merging them in the background.
Normalisation S4
Organising a schema (1NF→3NF and beyond) so each fact is stored once, eliminating insert/update/delete anomalies.
Denormalisation S7
Deliberately re-introducing redundancy to avoid expensive joins and speed up reads.
Primary / foreign key S4
A primary key uniquely identifies a row; a foreign key references a primary key in another table to enforce relationships.
Many-to-many S4
A relationship resolved with a junction (associative) table holding two foreign keys.
ER diagram / DFD S5
ER diagrams model entities, attributes and cardinalities; Data Flow Diagrams show how data moves between processes and stores.
ACID S4
Transaction guarantees: Atomicity, Consistency, Isolation, Durability β€” the contract that keeps relational data correct under concurrency and failure.
JOIN S8
Combining rows from tables on a condition: INNER (matches only), LEFT/RIGHT (keep one side), FULL (keep both), CROSS (cartesian).
Set operations S8
UNION, INTERSECT and EXCEPT combine result sets as set union, intersection and difference.
Aggregation S9
GROUP BY with functions like SUM/COUNT/AVG, filtered by HAVING, to summarise many rows into groups.
Window function S9
A function computed OVER a partition of rows (ranks, running totals) without collapsing them.
CTE S9
A Common Table Expression β€” a named WITH subquery that improves readability and enables recursion.
OLTP vs OLAP S13
OLTP handles many small transactions (app-facing); OLAP runs large analytical scans (report-facing); warehouses are OLAP-shaped.
Star / snowflake schema S13
Dimensional models around a central fact table; star keeps dimensions denormalised, snowflake normalises them.
Facts & dimensions S13
Facts record measurable events; dimensions provide descriptive context (time, product, customer) for slicing them.
NoSQL S18
Non-relational stores (key-value, document, column-family, graph) trading strict schema and joins for scale and flexibility.
CAP theorem S18
A distributed store can guarantee at most two of Consistency, Availability and Partition-tolerance; partitions force a choice between C and A.
Sharding / partitioning S19
Splitting data across nodes by a key to scale storage and throughput horizontally.
Replication S4
Keeping copies of data on multiple nodes for availability and read scaling (leader–follower or multi-leader).
MapReduce S22
A distributed model: Map emits key-value pairs per record; Reduce aggregates values per key, with the framework handling distribution.
Spark RDD / DataFrame S23
Spark's distributed datasets evaluated lazily as a DAG, kept in memory across stages for fast iterative processing.
Tunable consistency S27
Choosing per-query consistency levels (e.g. QUORUM) to balance latency against how many replicas must agree (Cassandra).
Bloom filter S27
A space-efficient probabilistic set test answering "definitely not present" or "possibly present" β€” used to skip unnecessary disk reads.

Bibliography

Compulsory reading β€” both titles are available digitally and are mapped to the blocks where they are most relevant.

Behaviour, attendance and ethics are governed by the University's Code of Conduct, Attendance Policy and Ethics Code; the Program Director may provide further indications.