modeling-lab · course outline — simulating & modeling to understand change

Simulating and Modeling to Understand Change

A first-year BCSAI course on using computers to substitute for physical experimentation — generating randomness, running Monte Carlo and discrete-event simulations, and building regression and classification models to understand and predict the behavior of complex systems.

Simulation and modeling is a substitute for physical experimentation in which computers explore a phenomenon, letting us understand a system and predict its behavior without testing it in the real world. Simulations can be more realistic than traditional experiments and are usually faster and cheaper to run, which is why the technique appears across mathematics, physics, engineering, psychology and biology — anywhere complex behavior emerges from many smaller interacting elements. The course covers Monte Carlo simulation, discrete-event simulation, model building, and regression & classification models, and discusses how each is applied to real-life problems. Students learn and practice statistics and programming; by the end they can conduct a full simulation study and model real-life scenarios. Every numbered topic below maps to a runnable demo on the interactive demos page.

Program

BCSAI — Bachelor in Computer Science & AI

Course code

SMUC-N-CSAI.1.M.A

Area

Mathematics

Sessions

30 (live, in-person)

Credits

6.0 ECTS · 150 h

Academic year

25–26 · 1st course

Semester

2nd

Learning objectives

By the end of the course students will know how to conduct a simulation study and model real-life scenarios across these core competencies.

Random number generationGenerate and validate pseudo-random sequences (LCG, period, randomness tests).

Random variable generationSample discrete and continuous distributions for stochastic simulation.

Monte Carlo simulationEstimate quantities and run inference by repeated random sampling.

Discrete-events simulationModel systems as sequences of discrete events with SimPy.

Regression modelsBuild, interpret and validate linear and polynomial regressions.

Classification modelsApply logistic regression and evaluate classifiers (confusion matrix, ROC).

Methodology & assessment

IE's teaching method is collaborative, active and applied. Learning is distributed across a range of activities (150 h total) and graded through continuous evaluation plus two exams.

Learning activities (weight · est. hours)

Group work · 40 h

26.7%

Lectures · 30 h

20.0%

Discussions · 30 h

20.0%

Individual studying · 30 h

20.0%

Exercises · async · field · 20 h

13.3%

Total: 100% · 150 hours. These weights describe how a student is expected to budget effort across the term — they are activity time, not grade weight.

Assessment weighting

Final exam

30%

Group work (4 labs)

25%

Midterm exam

20%

Quizzes (per module)

15%

Class participation

10%

4 labs delivered via Turnitin · one quiz after each of the 7 modules · 80% attendance required to pass.

What each component asks of you

Class participation — 10%. Active contribution to in-class activities, discussions and labs. Deliverable: sustained engagement; evaluated on quality of reasoning and ability to connect concepts to real-world contexts, not mere attendance.
Group work — 4 labs · 25%. Guided, applied labs (Monte Carlo, DES, linear regression, logistic regression) completed in groups formed in Session 1. Deliverable: each lab submitted via Turnitin before the next synchronous class; evaluated on correct application of session concepts, code, and interpretation. Late = grade of zero unless a prior emergency/illness exception is arranged before the due date.
Quizzes — 15%. One (or more) quiz after each of the 7 modules. Evaluated on recall and understanding of that module's material; spreads assessment continuously across the term.
Midterm exam — 20% (Session 14). Covers all content to date. Format detailed by the professor well in advance.
Final exam — 30% (Session 30). Comprehensive, covering all course content. Format detailed well in advance.

Pass & attendance rules. Attendance is mandatory: a student attending fewer than 80% of sessions is graded FAIL for both the ordinary and the extraordinary (June/July) calls and must re-enroll. Each student has four allowed calls across two academic years. The June/July re-sit is a single comprehensive exam (continuous evaluation is not carried over), with a minimum pass of 5 and a maximum attainable grade of 8.0. Grade disputes must be resolved before the final exam, after attending the review session. GenAI may be used for assistance but inappropriate use is academic misconduct; acknowledging AI use does not affect the grade.

Program — 30 sessions across 7 modules

Every live in-person session, with its real title, objective, topic breakdown, core method/formula and annotated readings. Linked tags jump to the matching interactive demo.

I Introduction to Simulating and Modeling to Understand Change Sessions 1–2

Sets the conceptual stage: what a system is, what we mean by a model and a simulation, and the methodology that links them. Establishes the vocabulary the rest of the course assumes and forms the working groups used for all four labs.

Learning outcomes

Define system, model and simulation and explain how they relate.
Distinguish deterministic vs. stochastic and static vs. dynamic models.
Describe the stages of a simulation study and when simulation beats physical experimentation.

Course Presentation live

Get to know each other, walk through the most important aspects of the course and syllabus, and define the working groups for the rest of the term.

Knowing each other (30 min) — introductions and expectations.
Course specifications (50 min) — objectives, methodology, assessment, attendance, and formation of lab groups.

Key idea: groups formed today carry through all four labs (25% of the grade), so collaboration norms matter from day one.

Introduction to Simulation and Modeling live

Introductory theory of simulation and modeling: define what a system is and study practical examples.

Systems, models and simulation (20 min) — core terminology: state, entities, attributes, events.
Why simulate? (40 min) — advantages over physical experiments; types of simulation (static/dynamic, deterministic/stochastic, continuous/discrete); stages and methodology of a study.
What is a model? (40 min) — predictive statistics, types of models, and the modeling methodology.

Key idea: a model is a simplified representation of a system; a simulation runs that model forward in (virtual) time to observe behavior we cannot or should not test in reality.

demo · modeling change

Demo: visualizes how a simple growth model evolves a system's state over time — a first concrete instance of "running a model forward."

II Random Numbers and Random Variables Generation Sessions 3–5

Randomness is the raw material of stochastic simulation. This module builds it from scratch: generate pseudo-random numbers, test that they behave randomly, then transform them into samples from any distribution you need.

Learning outcomes

Generate uniform pseudo-random numbers with a Linear Congruential Generator and reason about its period.
Test a generator for uniformity and independence before trusting it.
Sample from discrete and continuous distributions via the inverse-transform method in Python.

Generating Random Numbers live

Essential to simulating stochastic behavior: learn the properties of random numbers and how to generate them.

Random number properties (20 min) — desired uniformity on [0,1) and independence.
Random number generation (30 min) — pseudo-random sequences, seeds and reproducibility.
In-class exercises (30 min) — the Linear Congruential Method.

Core method — LCG: X_n+1 = (a·X_n + c) mod m, with U_n = X_n/m. Example: a=5, c=3, m=16, X₀=7 → 6, 1, 8, 11, 10, 5… The period can be at most m.

demo · LCG

Demo: drive a, c, m, X₀ yourself and watch the sequence repeat — a hands-on feel for why parameter choice controls the period.

Tests of Randomness live

Before trusting Python's generators, learn the tests of randomness and apply them to Python's random functions.

Testing uniformity (20 min) — chi-square / Kolmogorov–Smirnov goodness-of-fit against the uniform.
Testing independence (30 min) — runs test and autocorrelation to detect serial patterns.
In-class exercises (30 min) — applying the tests to Python's random functions.

Core method — chi-square: bin the numbers into k intervals and compute χ² = Σ (Oᵢ − Eᵢ)² / Eᵢ; a large value relative to χ²_k−1 means the "uniform" assumption is rejected.

demo · LCG

Demo: reuse the LCG and inspect the spread of its outputs as a visual uniformity check.

Simulating Discrete and Continuous Random Variables live

Simulate variables following different distributions to reproduce many kinds of scenarios in Python.

Simulating discrete RVs (30 min) — Bernoulli, binomial, Poisson from uniforms.
Simulating continuous RVs (30 min) — exponential, normal via transforms.
In-class exercises (20 min).

Core method — inverse transform: if U ~ Uniform(0,1) then X = F⁻¹(U) follows the target CDF F. Example (exponential): X = −ln(1−U)/λ.

demo · inverse transform

Demo: map uniform draws through F⁻¹ and see the target distribution's shape emerge.

III Monte Carlo Simulation Sessions 6–8

Monte Carlo turns repeated random sampling into numerical answers — for integrals, probabilities and statistical inference. The module moves from estimating a constant (π) to using sampling distributions to reason about unknown quantities, and closes with the first graded lab.

Learning outcomes

List and apply the steps of a Monte Carlo study and quantify its error.
Estimate π and other quantities by random sampling.
Use Monte Carlo to build sampling distributions and make inferences about unknowns.

Monte Carlo Simulation I live

Monte Carlo is the most famous simulation tool: cover its history, integrate the steps of a simulation, and estimate π.

Monte Carlo history (10 min) — Ulam, von Neumann and the Manhattan Project.
Methods vs. simulation (10 min) — the broad numerical method vs. simulating a stochastic system.
Steps in a study (15 min) — define, sample, evaluate, aggregate, assess error.
Use case (45 min) — the π experiment.

Core method — estimating π: drop N random points in a unit square; the fraction inside the quarter circle approximates the area, so π ≈ 4·(points inside)/N. Error shrinks like 1/√N.

demo · Monte Carlo π

Demo: add samples and watch the estimate converge toward 3.14159 with the tell-tale 1/√N noise.

Monte Carlo Simulation II live

One of the most useful uses of Monte Carlo: making inferences about real-world scenarios.

What is inferential statistics? (20 min) — estimating population quantities from samples.
Monte Carlo for inference (20 min) — simulating sampling distributions when formulas are hard.
Use case (20 min) — the taxi (German-tank) problem: estimate a maximum from observed labels.
Use case (20 min) — the sampling problem experiment.

Key idea: by resampling many times we approximate the sampling distribution of an estimator, then read off its bias, variance and confidence intervals directly — no closed-form needed.

demo · sampling distribution

Demo: repeatedly resample and watch a sampling distribution form, illustrating how Monte Carlo replaces analytic derivations.

Monte Carlo Simulation Lab lab

Groups complete a guided lab applying Monte Carlo simulation.

Lab explanation and delivery instructions (10 min).
Guided lab completion (70 min).

Lab 1 · Turnitin

Lab 1 of 4 (part of the 25%): apply Monte Carlo to a stochastic scenario; submit via Turnitin before the next synchronous class.

IV Discrete Events Simulation Sessions 9–11

Where Monte Carlo treats variables that change continuously, Discrete-Event Simulation (DES) advances a system from one event to the next — arrivals, service completions, departures. The module introduces the DES framework, implements it with SimPy, and applies it to a hospital queue.

Learning outcomes

Explain DES vocabulary: entities, events, the event list and the simulation clock.
Build a queue model in Python with SimPy and run it.
Extract and interpret performance metrics (waiting time, utilization, queue length).

Discrete Events Simulation I live

Unlike Monte Carlo, DES models system behavior as a sequence of discrete events over time. Learn the basics and implement DES in Python with SimPy.

Intro to DES (30 min) — terminology and framework: events, the future-event list, the clock that jumps event-to-event.
The SimPy package (50 min) — processes, resources and environments.

Core model — M/M/1 queue: Poisson arrivals at rate λ, exponential service at rate μ, utilization ρ = λ/μ. Mean number in system L = ρ/(1−ρ) (stable only when ρ < 1).

demo · M/M/1 queue

Demo: tune λ and μ and watch the queue stabilize or blow up as ρ crosses 1.

Discrete Events Simulation II live

Extract information from a DES using SimPy.

Simulating a hospital queue in Python (40 min) — patients as entities, doctors as resources.
Interpreting SimPy results (40 min) — average wait, server utilization, max queue length.

Key idea: a single DES run is one random realization; average over many replications (and discard a warm-up period) before trusting the reported metrics.

demo · M/M/1 queue

Demo: the same queue model surfaces the metrics SimPy would report for the hospital example.

Discrete Events Simulation Lab lab

Groups complete a guided lab applying discrete events simulation.

Lab explanation and delivery instructions (10 min).
Guided lab completion (70 min).

Lab 2 · Turnitin

Lab 2 of 4 (part of the 25%): model and analyze a queueing system in SimPy; submit via Turnitin before the next class.

V Model Building Sessions 12–13

A bridge from simulation to statistical modeling: what distinguishes a model from a bare function, how to read and design one, and how variation, covariation and cross-validation tell us whether a model will generalize.

Learning outcomes

Distinguish a function from a model and identify a model's basic elements.
Quantify variation and covariation between variables.
Explain why cross-validation guards against over-optimistic in-sample fit.

Model Building I live

Mathematical models help us analyze and predict. Learn the basic elements of a model and how to read and design one.

Functions vs. models (10 min) — a model adds an error term to a deterministic function.
Reading models (30 min) — parameters, inputs, outputs and assumptions.
Model design (40 min) — choosing form and variables for a question.

Key idea: a statistical model is Y = f(X) + ε — structure plus irreducible noise. Modeling is choosing f and understanding ε.

Model Building II live

Find patterns within and between variables, and learn why cross-validation is so important in stochastic models.

Variation analysis (20 min) — variance as spread of a single variable.
Covariation analysis (20 min) — covariance and correlation between variables.
Cross validation (40 min) — hold-out and k-fold estimates of out-of-sample error.

Core method — k-fold CV: split data into k folds, train on k−1 and test on the held-out fold, rotate, then average the errors to estimate generalization.

demo · cross-validation

Demo: contrast training error with cross-validated error to see overfitting appear as model complexity grows.

VI Regression Models Sessions 14–23

The course's largest module. Opens with the midterm, then develops linear regression end to end: from a single predictor through multiple predictors, residual diagnostics, categorical and interaction terms, polynomial effects, and finally variable selection and cross-validation — closing with the two-session linear regression lab.

Learning outcomes

Fit and interpret simple and multiple linear regressions and their key statistics (slope, R², p-values).
Diagnose a model through its residuals and check the assumptions needed to generalize.
Incorporate categorical variables, interactions and polynomial terms appropriately.
Select variables and apply cross-validation to balance under- and over-fitting.

Midterm Exam exam

Test covering all course content to date (Modules I–V). Format detailed by the professor well in advance.

Weight: 20% of the final grade. Missing a class in which an exam is held grants no make-up except by academic-director exception.

Simple Linear Regression I live

Define what a regression problem is and explain the most basic aspects of Simple Linear Regression, the most popular technique to solve it.

Simple LR introduction (20 min) — regression vs. classification; predicting a continuous target.
Theoretical demonstration (20 min) — deriving the line that minimizes squared error.

Core method — OLS: fit ŷ = β₀ + β₁x by minimizing Σ(yᵢ − ŷᵢ)²; this gives β₁ = cov(x,y)/var(x) and β₀ = ȳ − β₁x̄.

demo · least squares

Demo: drag points and watch the least-squares line and its residuals update live.

Simple Linear Regression II live

The most important statistics for interpreting the results of a Simple LR.

Simple LR with simulated data (20 min).
Theoretical interpretation (20 min) — what each statistic means.
Practical interpretation (20 min) — reading the output table.
Implementation in R (20 min).

Key statistics: R² = fraction of variance explained; the slope's t-statistic and p-value test whether the relationship is real.

demo · least squares

Demo: see how R² responds as the cloud of points tightens around the line.

Multiple Linear Regression live

Include more than one variable in a linear regression model.

MLR implementation in Python (40 min).
Interpreting MLR results (40 min) — and the multicollinearity problem.

Core form: ŷ = β₀ + β₁x₁ + … + βₚxₚ. Each βⱼ is the effect of xⱼ holding the others fixed; high VIF signals collinear predictors.

demo · least squares

Demo: the least-squares demo grounds the same minimization principle extended to several predictors.

Residuals and Assumptions live

Analyzing residuals is essential to judging model quality and checking the assumptions needed to generalize results.

The error component (40 min) — interpreting residuals eᵢ = yᵢ − ŷᵢ.
Assumptions (40 min) — linearity, independence, homoscedasticity, normal errors.

Key idea: a good model leaves residuals that look like pure noise — patterned residuals (curvature, fanning) flag a violated assumption and an untrustworthy inference.

Categorical Variables and Interaction Effects live

Beyond continuous main effects, include and interpret categorical variables and interactions in LR models.

Categorical variables (40 min) — dummy variables vs. factors and how to read their coefficients.
Interaction effects (40 min) — when the effect of one variable depends on another.

Key idea: a dummy shifts the intercept per group; an interaction term β·x₁x₂ lets the slope itself change across groups.

Polynomial Regression live

When slope and intercept are not enough, use second-, third- and higher-order effects to improve predictions.

Identifying polynomial effects (20 min) — curvature visible in the residuals.
Implementing in Python (30 min).
Interpreting results (30 min).

Core form: ŷ = β₀ + β₁x + β₂x² + … + βₖxᵏ — still linear in the coefficients, so OLS applies; higher order risks overfitting.

demo · polynomial fit

Demo: raise the polynomial degree and watch the curve flex from underfit to overfit.

Variable Selection and Cross Validation live

Navigate between overfitting and underfitting: select optimal variables and apply cross-validation for the best fit.

Variable selection (40 min) — ANOVA, best-subset regression, stepwise selection.
Cross validation in a LR model (40 min).

Key idea: more variables always improve in-sample fit but eventually hurt out-of-sample error; selection + CV find the model that generalizes, not the one that memorizes.

demo · overfitting & CV

Demo: the U-shaped cross-validation error curve makes the bias–variance trade-off concrete.

Linear Regression Lab lab

Groups complete a guided lab applying linear regression.

Lab explanation and delivery instructions (10 min).
Guided lab completion (70 min).

Lab 3 · Turnitin

Lab 3 of 4 (part of the 25%): build and validate a linear regression model; continues in Session 23.

Linear Regression Lab (continuation) lab

Groups continue the guided linear regression lab.

Guided lab completion (80 min).

Lab 3 · Turnitin

Lab 3 continued: finalize and submit via Turnitin before the next synchronous class.

VII Classification Models Sessions 24–30

When the target is categorical rather than continuous, regression logic breaks down. This module derives logistic regression from linear regression via the logit transformation, implements it in Python, validates classifiers with the confusion matrix and ROC curve, runs the final lab, and ends with the comprehensive final exam.

Learning outcomes

Explain why linear regression fails for categorical targets and how the logit fixes it.
Fit and interpret a logistic regression (coefficients as log-odds, odds ratios).
Choose a cutoff and evaluate a classifier with the confusion matrix and ROC/AUC.

From Linear Regression to Logistic Regression I live

When the target has discrete nominal values, LR logic no longer applies — understand the mathematics behind logistic regression.

Linear vs. logistic regression (20 min) — predicting a probability, not a level.
Mathematics behind logistic regression (30 min) — the sigmoid and maximum likelihood.

Core form — sigmoid: p = 1 / (1 + e^−(β₀+β₁x)) squashes any linear score into (0,1), giving a valid probability.

demo · logit curve

Demo: shift β₀, β₁ and watch the S-curve slide and steepen.

From Linear Regression to Logistic Regression II live

The logit transformation and how to read the key statistics of a logistic model.

The logit transformation (40 min) — linearizing the model in log-odds.
Interpreting key statistics (40 min).

Core method — logit: ln(p/(1−p)) = β₀ + β₁x, so a one-unit rise in x multiplies the odds by e^β₁ (the odds ratio).

demo · logit curve

Demo: connects the visible S-curve to the linear log-odds scale behind it.

Logistic Regression Implementation in Python live

Implement a logistic regression model in Python and interpret its results.

Implementation in Python (40 min) — fitting with scikit-learn / statsmodels.
Results interpretation (40 min) — coefficients, odds ratios, significance.

Key idea: coefficients are reported on the log-odds scale; exponentiate them to get odds ratios that stakeholders can read directly.

demo · logit curve

Demo: mirrors the fitted-probability curve that Python produces from real data.

Logistic Regression Validation live

Classification targets need different validation tools — learn the statistics that validate classification models.

Confusion matrix and cutoff point (40 min) — TP/FP/TN/FN, accuracy, precision, recall.
ROC curves (40 min) — trade-off across all cutoffs; AUC as a summary.

Core metrics: precision = TP/(TP+FP), recall = TP/(TP+FN); the ROC plots recall vs. false-positive rate, and AUC = probability the model ranks a random positive above a random negative.

demo · confusion matrix & ROC

Demo: slide the cutoff and watch the confusion matrix and the point on the ROC curve move together.

Logistic Regression Lab lab

Groups complete a guided lab applying logistic regression.

Lab explanation and delivery instructions (10 min).
Guided lab completion (70 min).

Lab 4 · Turnitin

Lab 4 of 4 (part of the 25%): fit and validate a classifier; continues in Session 29.

Logistic Regression Lab (continuation) lab

Groups continue the guided logistic regression lab.

Guided lab completion (80 min).

Lab 4 · Turnitin

Lab 4 continued: finalize and submit via Turnitin before the next synchronous class.

Final Exam exam

Comprehensive test covering all course content. Format detailed by the professor well in advance.

Weight: 30% of the final grade — the single largest component. A failed ordinary call leads to a comprehensive June/July re-sit (max grade 8.0); fewer than 80% attendance forfeits both calls.

Key concepts

A working glossary of the terms used across the seven modules. Each entry is the short definition you should be able to give from memory by the final exam.

System: A set of interacting elements whose collective behavior we want to understand or predict.
Model: A simplified representation of a system, typically Y = f(X) + ε — structure plus noise.
Simulation: Running a model forward in virtual time to observe behavior without testing the real system.
Stochastic vs. deterministic: Stochastic models contain randomness and give different outputs per run; deterministic ones always repeat.
Pseudo-random number: A deterministically generated value that behaves statistically like a true random draw.
Linear Congruential Generator (LCG): Generator X_n+1 = (aX_n+c) mod m; classic source of uniform pseudo-randomness.
Period: How many values a generator produces before the sequence repeats; at most m for an LCG.
Uniformity / independence tests: Checks (chi-square, KS, runs, autocorrelation) that a generator's output is flat and pattern-free.
Inverse-transform method: Sampling rule X = F⁻¹(U) that turns a uniform draw into one from any target distribution.
Monte Carlo simulation: Estimating quantities by repeated random sampling; error typically shrinks as 1/√N.
Inferential statistics: Drawing conclusions about a population from a sample, often via a sampling distribution.
Sampling distribution: The distribution of an estimator across many samples; the basis for confidence intervals.
Discrete-Event Simulation (DES): Advancing a system clock event-to-event rather than continuously.
SimPy: Python library for process-based DES using environments, processes and resources.
M/M/1 queue: Single-server queue with Poisson arrivals and exponential service; stable when ρ = λ/μ < 1.
Utilization (ρ): Fraction of time a server is busy, λ/μ; values near 1 cause long queues.
Cross-validation: Estimating out-of-sample error by training and testing on rotating data folds.
Overfitting / underfitting: Fitting noise (too complex) vs. missing structure (too simple); CV finds the balance.
Ordinary Least Squares (OLS): Fitting a linear model by minimizing the sum of squared residuals.
R²: Proportion of variance in the target explained by the model, between 0 and 1.
Residual: Observed minus predicted, eᵢ = yᵢ − ŷᵢ; their pattern reveals model adequacy.
Multicollinearity: Predictors that are strongly correlated, making individual coefficients unstable (high VIF).
Dummy variable / interaction: Encoding a category (shifts intercept) and a product term (changes slope across groups).
Logistic regression: Classification model p = 1/(1+e^−(β₀+β₁x)) for a binary target.
Logit / odds ratio: Log-odds ln(p/(1−p)) is linear in x; e^β is the odds ratio.
Confusion matrix: Table of TP, FP, TN, FN summarizing classifier performance at a chosen cutoff.
ROC curve / AUC: Recall vs. false-positive rate across cutoffs; AUC summarizes ranking quality.

Bibliography

Recommended reading, annotated with the sessions each source supports.

Robert, C. P., & Casella, G. (2010). Introducing Monte Carlo Methods with R. New York: Springer. ISBN 9781441915757 (printed). What it covers: a hands-on, code-first treatment of random-variable generation, Monte Carlo integration and inference, and simulation diagnostics. Supports: Module II (random numbers & variables, Sessions 3–5) and Module III (Monte Carlo, Sessions 6–8); its simulation mindset also underpins the cross-validation discussion in Module V.

Beyond the recommended text, the course relies on the official IE policies referenced in the syllabus — the Code of Conduct, Attendance Policy and Ethics Code — and on Python (with SimPy and scikit-learn / statsmodels) plus R as the working tools for the labs.