ml-lab · course outline — AI: Machine Learning Foundations

AI: Machine Learning Foundations

Bachelor in Computer Science & Artificial Intelligence (BCSAI) · IE University · a 30-session, syllabus-driven map of the course — every module and every session, with the interactive ml-lab demos cross-linked where they bring a concept to life.

Artificial Intelligence has moved into the mainstream driven by advances in cloud computing, big data, open-source software, and improved algorithms — fundamentally altering how we work, live, and manage businesses. Machine Learning is the cornerstone of that shift: systems that are not directly programmed to solve a problem, but instead build their own program from examples or from trial-and-error experience.

This course introduces the field and sets up a framework of knowledge for making informed analysis of the opportunities and challenges of applying ML in business. It blends a theoretical/conceptual approach with a hands-on technical understanding of every stage of an ML project, implemented in Python (pandas, matplotlib, scikit-learn, TensorFlow, PyTorch) at a basic-to-intermediate level — always keeping the business perspective in view.

Formally, supervised learning fits a function $f_\theta : \mathcal{X} \to \mathcal{Y}$ by choosing parameters $\theta$ that minimise an empirical risk $\hat{R}(\theta) = \frac{1}{n}\sum_{i=1}^{n} L\!\left(y_i, f_\theta(x_i)\right)$ over a training set of $n$ examples — a single idea that recurs, in different guises, through almost every session below.

Program

BCSAI — Computer Science & AI

Course code

AIMLF-CSAI.2.M.B

Area

Computer Science

Sessions

Credits

6.0 ECTS

Academic year

2025–26

Degree course

Second

Semester

2º

Learning objectives

The main objective is to introduce students to ML and build a framework for analysing the opportunities and challenges of applying it in business. Specifically, the course aims to:

Contextual understanding. Acquire a contextual understanding of ML, its history and evolution, to make relevant predictions about its future trajectory.
Strategic impact. Understand the profound, strategic changes ML introduces in technological and business environments, and appreciate ML as a key source of competitive advantage for firms.
Solution design. Analyse the features and components of ML solutions, understand the approaches for designing and implementing them, and assess the challenges, difficulties and risks in their successful deployment.
Application fit. Evaluate the appropriateness of a business application for prediction, optimization, natural language processing, robotics, computer vision and other emergent areas.
Staying current. Know how to stay continuously updated on new trends and advances in the ML field.

By the end, students should use Python and its ecosystem to access and analyse data from several sources, build predictive models with supervised and unsupervised techniques, perform feature engineering, and tune and compare models against metrics such as accuracy and speed. The professor's stated personal goal: that you walk away with a strong conceptual grasp of the technologies behind ML, the ability to build a business case for ML in your organisation, and a strong desire to go deeper.

Methodology & assessment

The course progresses from the most basic ML concepts to increasingly difficult problems, contrasting methods to explain when each is most appropriate. It is lecture- and example-based with group in-class discussions, following a tailored path for future AI Managers: from theory, to the tools and techniques that make implementation possible, to deployable business solutions.

Six teaching elements run through the term: lectures (theory, with on-time comprehension checks), examples / tutorials / cases (preparatory for assignments, with what-if interactive analysis), discussions (some announced in advance, requiring preparation), assignments (implement an algorithm in Python / Jupyter and analyse results), exams (one formal test plus practice "test-exams"), and group work (a final project presentation). Individual assignments are delivered in dynamically formed small groups; the final group project uses preference-based grouping. GenAI tools are allowed — with acknowledgement — for research, coding and exam practice, but never during the final exam.

Learning-activity weighting

Exercises · async · field work 53.3%

Lectures 16.7%

Individual studying 13.3%

Group work 10.0%

Discussions 6.7%

Total dedication ≈ 150 hours (80 h exercises · 25 h lectures · 20 h individual study · 15 h group work · 10 h discussions).

Assessment weighting

Final exam 30%

Individual assignments 30%

Participation 20%

Group assignment 12%

Group presentation 8%

One formal test/exam plus practice "test-exams". 80% attendance required. GenAI tools are allowed for assignments and exam practice with acknowledgement — but not during the final exam.

Assessment components in detail

Final exam30%test/exam · 180 pts max (MLF_28)

Deliverable: a concept-summary review examination written in class without GenAI. Evaluation: most questions are drawn from the practice "test-exam" pool. It is formally "not a final exam" in the continuous track — its score is added to the other items and carries no minimum passing grade.

Individual assignments30%3 × 70 pts (MLF_07 · 18 · 22) + 40 pts practice (MLF_23)

Deliverable: Python Jupyter notebooks implementing an algorithm end-to-end, with a written summary and analysis of results; review criteria are published per task. Evaluation: done jointly in dynamically formed small groups (no pair repeats); collaboration is enforced inside the group only. Some may run as a "bake-off" scored on held-out test data.

Participation20%4 pts / session × 30 = 120 pts

Deliverable: prepared, professional-level engagement in every session — questions, discussion and group interaction. Evaluation: a fixed 4 points per session; "come prepared as if it were a meeting in your company."

Group assignment12%part of the 120-pt group track

Deliverable: the final group project (MLF_22 onward), with preference-based grouping. Evaluation: applies the full pipeline to a business problem with explainability and interpretability in focus.

Group presentation8%50 pts (MLF_29–30)

Deliverable: a presentation of the group project with discussion and Q&A. Evaluation: communication quality and depth of the work shown.

Roughly one third of the grade comes from attendance/participation, one third from individual assignments, and one third from participation + group work and presentation. Pass & attendance rules: 80% attendance is required — students below it fail both the ordinary and extraordinary (June/July re-sit) calls and must re-enrol. The re-sit is a single comprehensive exam (continuous evaluation is ignored), passing grade 5 with a maximum of 8.0. Four calls are allowed across two academic years; failing more than 18 ECTS after re-sits means leaving the program.

Program — 30 sessions

The schedule below mirrors the syllabus exactly: knowledge blocks integrate theory first, then practice/tutorials, then assignments of assorted complexity. Filled timeline dots mark hands-on practice/assignment sessions. Where a concept maps to a live demo, a demo ↗ tag links straight to it. Each session below carries the core formula or definition it rests on, plus a short key-idea takeaway. (The schedule is tentative; pace adapts to the group and to recent advances in the field.)

I · Foundations & the ML pipeline II · Core algorithms & training III · Practice, integration & tuning IV · Evaluation & deployment V · Neural networks & deep learning VI · Frontiers, applications & assessment

I Foundations & the machine-learning pipeline sessions 1–4

The opening block frames what intelligence and learning mean computationally, then walks the lifecycle of a real ML project from scoping through data preparation to feature engineering. The message: most of the work — and most of the risk — lives before any model is trained.

By the end of Module I you should be able to

Explain how learning systems differ from explicitly programmed ones, and why some problems are intractable.
Lay out the stages of an ML project and the roles a cross-functional team needs.
Diagnose and repair common data issues — missing values, outliers, leakage — and encode variables correctly.
Construct, transform and select features, and reason about dimensionality.

SESSION 1 · MLF_01

Introduction

The basic concepts about AI and its application in ML are introduced.

Intelligence & knowledge representation: what it means for a system to "know" and act on knowledge.
Basic algorithms & intractable problems: why brute-force search fails as problems scale.
Heuristics: approximate strategies that trade guaranteed optimality for tractable runtime.

Core definition · empirical risk $\hat{R}(\theta) = \frac{1}{n}\sum_{i=1}^{n} L\!\left(y_i, f_\theta(x_i)\right)$ Learning = picking parameters $\theta$ that minimise average loss $L$ over the data, instead of hand-coding rules.

Key idea: a learner replaces an explicit program with an optimisation problem over examples.

ML Engineering · Ch.1 Introduction

Reading: Burkov Ch.1 sets terminology and the "what is ML engineering" framing for the whole course.

SESSION 2 · MLF_02

The anatomy of a machine-learning project

Exploration of the general pipeline for ML projects.

Scope & goals: translating a business question into a measurable prediction target.
Domain knowledge & teams: assembling the multidisciplinary skills a project needs.

Framing · expected value of a model $\text{value} = \mathbb{E}\big[\text{benefit}(\hat{y}, y)\big] - \text{cost}_{\text{build}} - \text{cost}_{\text{run}}$ A project is worth doing only when expected decision value exceeds the cost of building and operating the model.

Key idea: define success and feasibility before touching data — most ML projects fail in the framing, not the math.

ML Engineering · Ch.2 Before the Project Starts

Reading: Burkov Ch.2 — prioritising projects, estimating complexity, and team/role planning.

SESSION 3 · MLF_03

The importance of data — collection & preparation

Analysis of typical data issues and how to fix them.

Missing values & imputation: replacing gaps (e.g. with the mean/median) without distorting signal.
Outliers & leakage: spotting anomalous points and stopping target information from leaking into features.
Transformation: one-hot encoding and related methods to make data model-ready.

Core formula · standardisation (z-score) $z = \dfrac{x - \mu}{\sigma}$ Rescales each feature to zero mean and unit variance so no feature dominates by units alone.

Key idea: data leakage — letting the model peek at the answer — is the most common cause of "too good" results that collapse in production.

ML Engineering · Ch.3 Data Collection & Preparation feature scaling demo ↗

Reading: Burkov Ch.3 on data quality, leakage and partitioning. Demo: see scaling change a model's decision boundary live.

SESSION 4 · MLF_04

Feature engineering

The concept of a feature, and how to build and select features.

Extraction: deriving informative variables from raw data.
Selection: keeping the features that matter (filter, wrapper, embedded methods).
Dimensionality: complexity analysis and reduction to fight the curse of dimensionality.

Core formula · PCA objective $\max_{\|w\|=1} \operatorname{Var}(Xw) = w^\top \Sigma\, w$ Principal components are the directions $w$ of maximum variance — the leading eigenvectors of the covariance matrix $\Sigma$.

Key idea: good features often beat fancy models; reducing dimensions can improve both speed and generalization.

Assignment · Data Preparation ML Engineering · Ch.4 Feature Engineering PCA demo ↗

Assignment (Data Preparation): clean, encode and feature-engineer a real dataset in a notebook. Reading: Burkov Ch.4. Demo: watch variance concentrate along principal axes.

II Core algorithms & supervised training sessions 5–7

With clean data in hand, the course turns to the algorithm zoo: how to organise ML methods into families, how supervised models are actually trained by minimising a loss, and how to run a first end-to-end case from exploration to validation.

By the end of Module II you should be able to

Classify algorithms along the main axes (supervised/unsupervised, parametric/non-parametric, instance/model-based).
Describe how loss functions and gradient descent drive training.
Distinguish regression from classification and choose appropriate performance metrics.
Recognise overfitting and reason about generalization on held-out data.

SESSION 5 · MLF_05

Fundamental algorithms

A taxonomy of the main families of ML algorithms and the principles behind them.

Supervised vs unsupervised; parametric vs non-parametric.
Instance-based vs model-based; manual feature extraction vs representational methods.
Reflex vs state/variable-based models.
General principles: loss functions and gradient descent.

Core formula · gradient-descent update $\theta_{t+1} = \theta_t - \eta\, \nabla_\theta L(\theta_t)$ Step downhill along the loss gradient with learning rate $\eta$ — the engine behind most model training.

Key idea: nearly every algorithm is "a model class + a loss + an optimiser"; the taxonomy just varies those three choices.

gradient descent demo ↗ k-NN demo ↗

Demos: gradient descent shows the update rule converging; k-NN illustrates a non-parametric, instance-based learner with no training step.

SESSION 6 · MLF_06

Anatomy of an ML algorithm — supervised model training, the basics

General architecture and training methods for supervised models.

Task analysis: regression (continuous target) vs classification (discrete label).
Performance metrics appropriate to each task.
Overfitting & generalization: fitting signal, not noise.

Core formulas · MSE & cross-entropy $\text{MSE} = \frac{1}{n}\sum_i (y_i-\hat{y}_i)^2 \qquad \mathcal{L}_{\text{CE}} = -\frac{1}{n}\sum_i\big[y_i\log\hat{p}_i + (1-y_i)\log(1-\hat{p}_i)\big]$ MSE for regression; binary cross-entropy for classification — the two workhorse loss functions.

Key idea: the loss you choose is your definition of "good" — pick it to match the task and the cost of errors.

ML Engineering · Ch.5–6 Supervised Model Training logistic regression demo ↗ train/test & overfitting demo ↗

Reading: Burkov Ch.5–6 (Supervised Model Training, Parts 1–2). Demos: logistic regression fits a probability boundary; train/test split exposes overfitting.

SESSION 7 · MLF_07

A first basic practice

An end-to-end practical case, from EDA to model training and validation.

Exploratory data analysis through to validation.
Comparing classical algorithms (k-NN, trees, linear models) on the same task.

Core formula · accuracy $\text{Accuracy} = \dfrac{\text{TP}+\text{TN}}{\text{TP}+\text{TN}+\text{FP}+\text{FN}}$ The fraction of correct predictions — a first, simple yardstick for comparing models.

Key idea: a clean, reproducible pipeline (EDA → split → fit → validate) matters more than any single algorithm.

Assignment (70 pts) decision tree demo ↗ k-NN demo ↗

Assignment (70 pts): deliver a notebook that runs the full classical-ML pipeline and compares models; may be run as a bake-off on held-out data. Demos: trees and k-NN as contrasting model families.

III Unsupervised learning, practice & integration sessions 8–12

This block adds learning without labels (clustering, dimensionality reduction), confronts the messy realities of real data (imbalance, sampling bias, uncertainty), and shows how to compose and tune models into production-grade pipelines.

By the end of Module III you should be able to

Apply PCA and clustering, and measure unsupervised results with suitable metrics.
Handle imbalanced classes and sampling bias, and distinguish epistemic from aleatoric uncertainty.
Build pipelines, combine models with ensembles, and tune hyper-parameters systematically.
Carry a business application end-to-end and explain the strategy behind it.

SESSION 8 · MLF_08

Unsupervised learning

General architecture and algorithmic approach to learning without labels.

PCA & clustering: the two main unsupervised workhorses.
Metrics & uses: how to judge structure with no ground-truth labels.
Similarity & entropy: functions that quantify distance and disorder.

Core formula · k-means objective $\min_{\{C_k\}} \sum_{k=1}^{K}\sum_{x\in C_k} \lVert x-\mu_k\rVert^2$ Partition points into $K$ clusters so each is close to its centroid $\mu_k$ — minimising within-cluster variance.

Key idea: without labels, "good" means compact, well-separated structure — defined entirely by a chosen similarity measure.

k-means demo ↗ PCA demo ↗

Demos: k-means iterates centroid assignment live; PCA reduces dimensions while preserving variance.

SESSION 9 · MLF_09

A more advanced practice

Tackling the messier realities of real-world data.

Imbalanced classes: resampling, class weights and threshold tuning.
Sampling biases: when the training sample misrepresents the population.
Uncertainty: epistemic (reducible, model) vs aleatoric (irreducible, data noise).

Core metric · F1 (for imbalance) $F_1 = 2\cdot\dfrac{\text{precision}\cdot\text{recall}}{\text{precision}+\text{recall}}$ Harmonic mean of precision and recall — far more honest than accuracy when classes are skewed.

Key idea: on imbalanced data, accuracy lies; choose metrics and sampling that reflect the cost of each error type.

confusion matrix demo ↗

Demo: the confusion matrix shows how precision and recall trade off as the decision threshold moves.

SESSION 10 · MLF_10

Assignment review

Dissection and discussion of sample solutions.

Sample-solution walk-through: what a strong submission looks like.
Common difficulties: the recurring traps and how to avoid them.

Key idea: reviewing graded work against a reference solution is one of the fastest ways to internalise good practice.

SESSION 11 · MLF_11

ML integration — pipelines, ensemble methods, hyper-tuning

Composing and optimising models for production.

Ensembles & pipelines: chaining transforms and combining many models.
Hyper-parameter tuning: grid/random search and cross-validated selection.
AutoML & Bayesian optimization of parameters.

Core idea · ensemble averaging $\hat{y} = \frac{1}{M}\sum_{m=1}^{M} f_m(x)$ Averaging $M$ diverse models reduces variance — the principle behind bagging and random forests.

Key idea: combining weak, diverse learners — and tuning them as one pipeline — usually beats hand-optimising a single model.

bias–variance demo ↗

Demo: bias–variance shows why ensembles help — averaging cuts variance without adding much bias.

SESSION 12 · MLF_12

A first practical overview of typical applications

An end-to-end practical case of a business application.

Main concepts in play for a realistic deployment.
Implementation strategy for a successful rollout.

Key idea: a working business application is a chain of decisions — data, model, metric, deployment — each of which can sink the whole.

IV Model evaluation & deployment sessions 13–15

How do we know a model is actually good, and how do we keep it good once it leaves the notebook? This block covers rigorous validation, the metric landscape, and the realities of deployment and drift.

By the end of Module IV you should be able to

Choose and run cross-validation strategies and read the bias–variance tradeoff.
Select metrics across supervised/unsupervised tasks and multi-class/multi-label settings.
Describe the model life-cycle and detect data and concept drift after deployment.

SESSION 13 · MLF_13

Model evaluation

An extended analysis of validation methods.

Cross-validation & leave-one-out: reusing data to estimate generalization.
Consistency: stable performance across folds.
Bias–variance tradeoff: underfit vs overfit.

Core decomposition · bias–variance $\mathbb{E}\big[(y-\hat{f}(x))^2\big] = \text{Bias}^2 + \text{Var} + \sigma^2$ Expected error splits into bias, variance, and irreducible noise $\sigma^2$ — the lens for diagnosing models.

Key idea: k-fold CV averages over many train/test splits, giving a less optimistic, lower-variance estimate of true performance.

ML Engineering · Ch.7 Model Evaluation bias–variance demo ↗ train/test demo ↗

Reading: Burkov Ch.7 (Model Evaluation). Demos: bias–variance and train/test split make the decomposition tangible.

SESSION 14 · MLF_14

Performance metrics

Review and comparison of performance metrics across paradigms.

Supervised vs unsupervised metrics.
Accuracy, F-alpha, sensitivity & specificity.
Multi-class & multi-label generalization.

Core formula · F-beta score $F_\beta = (1+\beta^2)\cdot\dfrac{\text{precision}\cdot\text{recall}}{\beta^2\,\text{precision}+\text{recall}}$ $\beta$ weights recall over precision; sensitivity = recall = $\tfrac{\text{TP}}{\text{TP}+\text{FN}}$, specificity = $\tfrac{\text{TN}}{\text{TN}+\text{FP}}$.

Key idea: there is no universal metric — the right one encodes the relative cost of false positives vs false negatives for your problem.

confusion matrix demo ↗ ROC / AUC demo ↗

Demos: confusion matrix for threshold-level metrics; ROC/AUC for threshold-independent ranking quality.

SESSION 15 · MLF_15

Model deployment & maintenance — an introduction / overview

Taking a model beyond the notebook and keeping it healthy.

Model life-cycle: from serving to retraining and retirement.
Data drift & concept drift: when the world shifts under a trained model.

Core idea · drift detection $D_{\text{KL}}(P_{\text{train}} \,\|\, P_{\text{live}}) > \tau$ Flag drift when the live data distribution diverges from training beyond a threshold $\tau$.

Key idea: a deployed model is a living system — performance decays as distributions drift, so monitoring is part of the design.

ML Engineering · Ch.8–9 Deployment, Serving, Monitoring & Maintenance

Reading: Burkov Ch.8 (Model Deployment) & Ch.9 (Serving, Monitoring & Maintenance).

V Neural networks & deep learning sessions 16–19

The deep-learning block builds the neural network from the perceptron up: forward and backward passes, optimisation and regularization, representation learning, and a first look at agents that learn from reward.

By the end of Module V you should be able to

Explain forward/backward propagation and how automatic differentiation computes gradients.
Apply activation functions, optimizers and regularizers like dropout.
Reason about representation learning and well-known architectures.
State the RL problem and the Bellman optimality equation at an introductory level.

SESSION 16 · MLF_16

Neural networks & deep learning

Introduction to the perceptron and multi-layer perceptron.

Mathematical foundation of a neural network.
Forward & backward propagation; automatic differentiation.
Gradient descent variants (momentum, Adam) and key hyperparameters.
Dropout and other regularization techniques.

Core formulas · neuron & sigmoid $a = \sigma(w^\top x + b),\qquad \sigma(z) = \dfrac{1}{1+e^{-z}}$ A neuron is a weighted sum passed through a non-linearity; backprop applies the chain rule to get $\partial L/\partial w$.

Key idea: depth + non-linear activations let networks compose simple functions into rich ones; backprop makes training them tractable.

gradient descent demo ↗

Demo: gradient descent is exactly the optimiser that trains these networks, one weight update at a time.

SESSION 17 · MLF_17

Neural networks — representation learning

How networks learn useful representations of data.

Representation-learning through practical examples.
Representational models and learned embeddings.
Well-known architectures (CNNs, RNNs, transformers) and their uses.

Core idea · learned embedding $h = g_\phi(x) \in \mathbb{R}^d,\quad d \ll \dim(x)$ Hidden layers map raw input into a compact feature space $h$ where the task becomes easy — features are learned, not hand-built.

Key idea: deep learning's power is automatic feature engineering — the network discovers the representation that PCA only approximates linearly.

PCA demo ↗

Demo: PCA as a linear baseline for the non-linear representations a network learns.

SESSION 18 · MLF_18

Practice recap

How to code a neural network from the building blocks.

Tensors & gradient tapes: the core abstractions in TensorFlow/PyTorch.
Activations & optimizers: assembling a trainable model.
Design principles for layer/width/depth choices.

Core formula · ReLU activation $\text{ReLU}(z) = \max(0, z)$ The default hidden activation — cheap, non-saturating, and the reason deep nets train well.

Key idea: a "gradient tape" records operations so the framework can autodiff the loss — you specify the forward pass, it gives you the gradients.

Assignment (70 pts)

Assignment (70 pts): implement and train a neural network in a notebook (tensors, optimizer, activations) and analyse the results.

SESSION 19 · MLF_19

Reinforcement learning — an introductory overview

Intelligent agents that learn from the state-action-reward paradigm.

Temporal differences and the Bellman optimality equation.
From value-function iteration to policy-gradient algorithms.
A practical example case of use.

Core formula · Bellman optimality $Q^*(s,a) = \mathbb{E}\big[r + \gamma \max_{a'} Q^*(s',a')\big]$ The optimal action-value is the immediate reward plus the discounted ($\gamma$) best future value.

Key idea: RL replaces a labelled dataset with a reward signal — the agent learns by acting and observing consequences over time.

VI Frontiers, applications & assessment sessions 20–30

The closing block surveys the research frontier — sequential models, transfer and contrastive learning, generative models — then consolidates best practice, examines real-world risks, and runs the final assessment and group presentations.

By the end of Module VI you should be able to

Model sequential/time-series data and explain the move from classical methods to deep models.
Use transfer, fine-tuning and contrastive learning to reuse knowledge across tasks.
Reason about generative models (GANs, diffusion) and about explainability and risk.
Deliver and present an end-to-end ML business project.

SESSION 20 · MLF_20

Sequential modeling — from time-series analysis to deep NN models

Modeling data where order and time matter.

Sequential feature engineering: lags, windows, seasonality.
Basic techniques and their evolution toward deep models.
Example-based comparison of approaches.

Core idea · recurrent state $h_t = f(h_{t-1}, x_t)$ A hidden state carries information forward in time — the basis of AR models, RNNs, LSTMs and beyond.

Key idea: sequence models exploit temporal dependence that i.i.d. methods throw away — order is information.

SESSION 21 · MLF_21

Transfer learning & contrastive learning — theoretical basis and principles

Reusing learned knowledge across tasks.

Shallow training: principles and concepts.
Fine-tuning a pretrained model on a new task.
Multi-task & transfer perspectives.

Core idea · fine-tuning $\theta_{\text{new}} = \theta_{\text{pretrained}} - \eta\,\nabla_\theta L_{\text{target}}(\theta)$ Start from weights learned on a large source task and adapt them with a few gradient steps on the target task.

Key idea: representations learned on huge datasets transfer — fine-tuning beats training from scratch when target data is scarce.

SESSION 22 · MLF_22

Summarizing ML best practices — some problems and solutions

Applying the solutions explored to different types of problems.

Explainability & interpretability: understanding why a model predicts what it does.
End-to-end case analysis.
ML/DL business-function application.

Core idea · feature attribution $\hat{f}(x) \approx \phi_0 + \sum_{j} \phi_j$ Explainability methods (e.g. Shapley values) attribute a prediction to additive per-feature contributions $\phi_j$.

Key idea: in business, an unexplainable model is often an unusable one — interpretability is a deployment requirement, not a luxury.

Group Assignment (70 pts)

Group assignment (70 pts): launch the final group project — apply the full pipeline to a business problem with explainability in focus.

SESSION 23 · MLF_23

Deep generative models

Exploration of some of the novel architectures in this field.

Contrastive learning & cross-embeddings and their applications.
Text-to-image and text-to-video.
GANs & diffusion models — a brief analysis.

Core formula · GAN minimax objective $\min_G \max_D \; \mathbb{E}_{x}[\log D(x)] + \mathbb{E}_{z}[\log(1-D(G(z)))]$ A generator $G$ and discriminator $D$ play an adversarial game; diffusion models instead learn to reverse a noising process.

Key idea: generative models learn the data distribution itself, enabling synthesis — a different goal from the discriminative models of earlier modules.

Examination practice (40 pts)

Examination practice (40 pts): a "test-exam" assignment — scored on number attempted, average result and improvement; primes you for the formal exam format.

SESSION 24 · MLF_24

Advanced practice

Advanced practice recap consolidating the deep-learning material.

Recap & reinforcement of advanced practical techniques.

Key idea: deliberate, repeated practice on harder cases is what converts familiarity into fluency.

SESSION 25 · MLF_25

Applied machine learning — the industry view

A high-level exploration of business cases and applications.

Applications across industries (finance, healthcare, education, more).

Key idea: the same handful of algorithms recur across industries — what changes is the data, the metric, and the cost of being wrong.

SESSION 26 · MLF_26

Extended practice — case review

Dissection and discussion of sample solutions.

Sample-solution walk-through.
Main difficulties and how to overcome them.

Key idea: a second structured review, now on advanced material, to lock in the deep-learning workflow before assessment.

SESSION 27 · MLF_27

Real-world business applications — challenges, risks & steps ahead

Where the field is heading and what could go wrong.

Current research and expected breakthroughs.
Risks: from adversarial attacks to mesa-optimization misalignment.

Core idea · adversarial perturbation $x_{\text{adv}} = x + \epsilon\,\operatorname{sign}\!\big(\nabla_x L(x,y)\big)$ A tiny, gradient-aligned perturbation can flip a model's prediction — a core safety/security risk.

Key idea: capability and risk grow together; responsible deployment means anticipating attacks, drift and misalignment.

ML Engineering · Ch.10 Conclusion

Reading: Burkov Ch.10 (Conclusion) — where ML engineering is heading.

SESSION 28 · MLF_28

Test / exam

Concept-summary review examination.

Not a final exam — its score is added to the other evaluation items.
No minimum passing grade required; most questions reused from the practice test-exams.

Key idea: the formal test rewards consistent practice — questions are drawn largely from the test-exams you have already rehearsed.

Test / Exam (180 pts)

Test / exam (180 pts): written in class, no GenAI; the single largest point item, feeding the 30% final-exam weight.

SESSION 29 · MLF_29

Group projects — presentations (A)

Presentations, discussion and Q&A of group projects.

First round of group-project presentations.

Key idea: presenting work clearly is itself a graded ML-manager skill — the syllabus weights communication explicitly.

Group Presentation (50 pts)

Group presentation (50 pts): present the project with live discussion and Q&A; feeds the 8% presentation weight.

SESSION 30 · MLF_30

Group projects — presentations (B) · wrap-up

Final presentations and course wrap-up.

Second round of group-project presentations, discussion and Q&A.
Course wrap-up and synthesis.

Key idea: the wrap-up ties the 30 sessions back to the single thread — turning data into reliable, accountable decisions.

Key concepts — glossary

A compact reference for the terms and formulas that recur across the program. Each entry pairs a one-line definition with the symbol or expression used in the sessions above.

Supervised learning: Fitting $f_\theta:\mathcal{X}\to\mathcal{Y}$ from labelled pairs $(x_i,y_i)$ to predict labels on new inputs.
Unsupervised learning: Finding structure (clusters, low-dim factors) in data with no labels.
Reinforcement learning: An agent learns a policy by maximising cumulative reward through interaction with an environment.
Loss function $L$: A scalar measuring prediction error; training minimises its average over the data.
Empirical risk: The average loss on the training set, $\frac{1}{n}\sum_i L(y_i,f_\theta(x_i))$ — the quantity optimisers descend.
Gradient descent: Iterative update $\theta \leftarrow \theta - \eta\nabla L$ that moves parameters downhill on the loss.
Learning rate $\eta$: Step size in gradient descent; too large diverges, too small crawls.
Overfitting: Modelling noise in the training data so that test performance suffers.
Generalization: How well a model performs on unseen data drawn from the same distribution.
Bias–variance tradeoff: Error decomposes into bias (underfit), variance (overfit) and irreducible noise.
Cross-validation: Estimating generalization by rotating which data folds serve as train vs validation.
Regularization: Penalising model complexity (e.g. $L_2$, dropout) to curb variance and overfitting.
Feature engineering: Creating, transforming and selecting input variables to make patterns learnable.
One-hot encoding: Representing a categorical value as a binary indicator vector.
Data leakage: Target information sneaking into features, inflating offline scores that then collapse live.
PCA: Linear dimensionality reduction onto directions of maximum variance (top eigenvectors of $\Sigma$).
k-means: Clustering that minimises within-cluster squared distance to $K$ centroids.
Cross-entropy $\mathcal{L}_{\text{CE}}$: The standard classification loss, $-\sum y\log\hat{p}$, measuring probabilistic mismatch.
Precision / recall: Correctness among predicted positives vs coverage of actual positives; combined by $F_\beta$.
ROC / AUC: Curve of true- vs false-positive rate across thresholds; AUC summarises ranking quality.
Ensemble: Combining multiple models (bagging, boosting, stacking) to reduce error.
Hyper-parameter: A setting fixed before training (e.g. depth, $\eta$, $K$), tuned by search or Bayesian optimization.
Backpropagation: The chain rule applied through a network to compute loss gradients efficiently.
Activation function: Element-wise non-linearity (sigmoid, ReLU) that gives networks expressive power.
Concept / data drift: Shift in the input or target distribution after deployment, degrading a live model.
Transfer learning: Reusing a model pretrained on one task by fine-tuning it on a related target task.
Generative model: A model of the data distribution that can synthesise new samples (GANs, diffusion).
Epistemic vs aleatoric: Reducible model uncertainty vs irreducible noise inherent in the data.

Bibliography

The program follows one applied text as its backbone, with a concise companion for quick reference. Each entry is annotated with what it offers and which sessions draw on it.

Compulsory

Andriy Burkov (2020). Machine Learning Engineering. True Positive Inc. ISBN 978-1-9995795-7-9 (Digital).

The most complete applied AI book out there — filled with best practices and design patterns for building reliable, scaling ML solutions. Burkov holds a Ph.D. in AI and led a machine-learning team at Gartner; the book draws on 15 years of solving problems with AI.

Maps to: Ch.1 → S1 · Ch.2 → S2 · Ch.3 → S3 · Ch.4 → S4 · Ch.5–6 → S6 · Ch.7 → S13 · Ch.8–9 → S15 · Ch.10 → S27.

Recommended

Andriy Burkov (2019). The Hundred-Page Machine Learning Book. ISBN 1-9995795-0-X (Digital).

A successful effort to reduce all of machine learning to 100 pages — well-chosen topics across theory and practice, a solid introduction for practitioners.

Maps to: a concise companion for the algorithm and training material of Modules II–V (S5–S19) — useful quick reference before assignments.

Professor José Manuel Rey González · jreyg@faculty.ie.edu · office hours on request. Full details in the syllabus PDF.