Simulating and Modeling to Understand Change
A first-year BCSAI course on using computers to substitute for physical experimentation — generating randomness, running Monte Carlo and discrete-event simulations, and building regression and classification models to understand and predict the behavior of complex systems.
Simulation and modeling is a substitute for physical experimentation in which computers explore a phenomenon, letting us understand a system and predict its behavior without testing it in the real world. Simulations can be more realistic than traditional experiments and are usually faster and cheaper to run, which is why the technique appears across mathematics, physics, engineering, psychology and biology — anywhere complex behavior emerges from many smaller interacting elements. The course covers Monte Carlo simulation, discrete-event simulation, model building, and regression & classification models, and discusses how each is applied to real-life problems. Students learn and practice statistics and programming; by the end they can conduct a full simulation study and model real-life scenarios. Every numbered topic below maps to a runnable demo on the interactive demos page.
Prerequisites
- Basic knowledge of mathematics.
- Recommended: Fundamentals of Probability & Statistics and Data Insights & Visualization (previous semester).
- Beginner to moderate Python programming.
Why these matter: probability supplies the distributions and inference logic behind every simulation; visualization makes simulated output legible; Python is the implementation language for LCGs, Monte Carlo, SimPy and the regression labs.
Core modules
- I · Introduction to SMUC
- II · Random Numbers & Random Variables Generation
- III · Monte Carlo Simulation
- IV · Discrete Events Simulation
- V · Model Building
- VI · Regression Models
- VII · Classification Models
Arc of the course: first build the raw ingredient (randomness), then two simulation paradigms (Monte Carlo, discrete-event), then the general theory of models, and finally two supervised-learning workhorses (regression for continuous targets, classification for categorical ones).
Learning objectives
By the end of the course students will know how to conduct a simulation study and model real-life scenarios across these core competencies.
Methodology & assessment
IE's teaching method is collaborative, active and applied. Learning is distributed across a range of activities (150 h total) and graded through continuous evaluation plus two exams.
Learning activities (weight · est. hours)
Total: 100% · 150 hours. These weights describe how a student is expected to budget effort across the term — they are activity time, not grade weight.
Assessment weighting
4 labs delivered via Turnitin · one quiz after each of the 7 modules · 80% attendance required to pass.
What each component asks of you
- Class participation — 10%. Active contribution to in-class activities, discussions and labs. Deliverable: sustained engagement; evaluated on quality of reasoning and ability to connect concepts to real-world contexts, not mere attendance.
- Group work — 4 labs · 25%. Guided, applied labs (Monte Carlo, DES, linear regression, logistic regression) completed in groups formed in Session 1. Deliverable: each lab submitted via Turnitin before the next synchronous class; evaluated on correct application of session concepts, code, and interpretation. Late = grade of zero unless a prior emergency/illness exception is arranged before the due date.
- Quizzes — 15%. One (or more) quiz after each of the 7 modules. Evaluated on recall and understanding of that module's material; spreads assessment continuously across the term.
- Midterm exam — 20% (Session 14). Covers all content to date. Format detailed by the professor well in advance.
- Final exam — 30% (Session 30). Comprehensive, covering all course content. Format detailed well in advance.
Pass & attendance rules. Attendance is mandatory: a student attending fewer than 80% of sessions is graded FAIL for both the ordinary and the extraordinary (June/July) calls and must re-enroll. Each student has four allowed calls across two academic years. The June/July re-sit is a single comprehensive exam (continuous evaluation is not carried over), with a minimum pass of 5 and a maximum attainable grade of 8.0. Grade disputes must be resolved before the final exam, after attending the review session. GenAI may be used for assistance but inappropriate use is academic misconduct; acknowledging AI use does not affect the grade.
Program — 30 sessions across 7 modules
Every live in-person session, with its real title, objective, topic breakdown, core method/formula and annotated readings. Linked tags jump to the matching interactive demo.
Sets the conceptual stage: what a system is, what we mean by a model and a simulation, and the methodology that links them. Establishes the vocabulary the rest of the course assumes and forms the working groups used for all four labs.
- Define system, model and simulation and explain how they relate.
- Distinguish deterministic vs. stochastic and static vs. dynamic models.
- Describe the stages of a simulation study and when simulation beats physical experimentation.
Course Presentation live
Get to know each other, walk through the most important aspects of the course and syllabus, and define the working groups for the rest of the term.
- Knowing each other (30 min) — introductions and expectations.
- Course specifications (50 min) — objectives, methodology, assessment, attendance, and formation of lab groups.
Introduction to Simulation and Modeling live
Introductory theory of simulation and modeling: define what a system is and study practical examples.
- Systems, models and simulation (20 min) — core terminology: state, entities, attributes, events.
- Why simulate? (40 min) — advantages over physical experiments; types of simulation (static/dynamic, deterministic/stochastic, continuous/discrete); stages and methodology of a study.
- What is a model? (40 min) — predictive statistics, types of models, and the modeling methodology.
Demo: visualizes how a simple growth model evolves a system's state over time — a first concrete instance of "running a model forward."
Randomness is the raw material of stochastic simulation. This module builds it from scratch: generate pseudo-random numbers, test that they behave randomly, then transform them into samples from any distribution you need.
- Generate uniform pseudo-random numbers with a Linear Congruential Generator and reason about its period.
- Test a generator for uniformity and independence before trusting it.
- Sample from discrete and continuous distributions via the inverse-transform method in Python.
Generating Random Numbers live
Essential to simulating stochastic behavior: learn the properties of random numbers and how to generate them.
- Random number properties (20 min) — desired uniformity on [0,1) and independence.
- Random number generation (30 min) — pseudo-random sequences, seeds and reproducibility.
- In-class exercises (30 min) — the Linear Congruential Method.
Demo: drive a, c, m, X₀ yourself and watch the sequence repeat — a hands-on feel for why parameter choice controls the period.
Tests of Randomness live
Before trusting Python's generators, learn the tests of randomness and apply them to Python's random functions.
- Testing uniformity (20 min) — chi-square / Kolmogorov–Smirnov goodness-of-fit against the uniform.
- Testing independence (30 min) — runs test and autocorrelation to detect serial patterns.
- In-class exercises (30 min) — applying the tests to Python's random functions.
Demo: reuse the LCG and inspect the spread of its outputs as a visual uniformity check.
Simulating Discrete and Continuous Random Variables live
Simulate variables following different distributions to reproduce many kinds of scenarios in Python.
- Simulating discrete RVs (30 min) — Bernoulli, binomial, Poisson from uniforms.
- Simulating continuous RVs (30 min) — exponential, normal via transforms.
- In-class exercises (20 min).
Demo: map uniform draws through F⁻¹ and see the target distribution's shape emerge.
Monte Carlo turns repeated random sampling into numerical answers — for integrals, probabilities and statistical inference. The module moves from estimating a constant (π) to using sampling distributions to reason about unknown quantities, and closes with the first graded lab.
- List and apply the steps of a Monte Carlo study and quantify its error.
- Estimate π and other quantities by random sampling.
- Use Monte Carlo to build sampling distributions and make inferences about unknowns.
Monte Carlo Simulation I live
Monte Carlo is the most famous simulation tool: cover its history, integrate the steps of a simulation, and estimate π.
- Monte Carlo history (10 min) — Ulam, von Neumann and the Manhattan Project.
- Methods vs. simulation (10 min) — the broad numerical method vs. simulating a stochastic system.
- Steps in a study (15 min) — define, sample, evaluate, aggregate, assess error.
- Use case (45 min) — the π experiment.
Demo: add samples and watch the estimate converge toward 3.14159 with the tell-tale 1/√N noise.
Monte Carlo Simulation II live
One of the most useful uses of Monte Carlo: making inferences about real-world scenarios.
- What is inferential statistics? (20 min) — estimating population quantities from samples.
- Monte Carlo for inference (20 min) — simulating sampling distributions when formulas are hard.
- Use case (20 min) — the taxi (German-tank) problem: estimate a maximum from observed labels.
- Use case (20 min) — the sampling problem experiment.
Demo: repeatedly resample and watch a sampling distribution form, illustrating how Monte Carlo replaces analytic derivations.
Monte Carlo Simulation Lab lab
Groups complete a guided lab applying Monte Carlo simulation.
- Lab explanation and delivery instructions (10 min).
- Guided lab completion (70 min).
Lab 1 of 4 (part of the 25%): apply Monte Carlo to a stochastic scenario; submit via Turnitin before the next synchronous class.
Where Monte Carlo treats variables that change continuously, Discrete-Event Simulation (DES) advances a system from one event to the next — arrivals, service completions, departures. The module introduces the DES framework, implements it with SimPy, and applies it to a hospital queue.
- Explain DES vocabulary: entities, events, the event list and the simulation clock.
- Build a queue model in Python with SimPy and run it.
- Extract and interpret performance metrics (waiting time, utilization, queue length).
Discrete Events Simulation I live
Unlike Monte Carlo, DES models system behavior as a sequence of discrete events over time. Learn the basics and implement DES in Python with SimPy.
- Intro to DES (30 min) — terminology and framework: events, the future-event list, the clock that jumps event-to-event.
- The SimPy package (50 min) — processes, resources and environments.
Demo: tune λ and μ and watch the queue stabilize or blow up as ρ crosses 1.
Discrete Events Simulation II live
Extract information from a DES using SimPy.
- Simulating a hospital queue in Python (40 min) — patients as entities, doctors as resources.
- Interpreting SimPy results (40 min) — average wait, server utilization, max queue length.
Demo: the same queue model surfaces the metrics SimPy would report for the hospital example.
Discrete Events Simulation Lab lab
Groups complete a guided lab applying discrete events simulation.
- Lab explanation and delivery instructions (10 min).
- Guided lab completion (70 min).
Lab 2 of 4 (part of the 25%): model and analyze a queueing system in SimPy; submit via Turnitin before the next class.
A bridge from simulation to statistical modeling: what distinguishes a model from a bare function, how to read and design one, and how variation, covariation and cross-validation tell us whether a model will generalize.
- Distinguish a function from a model and identify a model's basic elements.
- Quantify variation and covariation between variables.
- Explain why cross-validation guards against over-optimistic in-sample fit.
Model Building I live
Mathematical models help us analyze and predict. Learn the basic elements of a model and how to read and design one.
- Functions vs. models (10 min) — a model adds an error term to a deterministic function.
- Reading models (30 min) — parameters, inputs, outputs and assumptions.
- Model design (40 min) — choosing form and variables for a question.
Model Building II live
Find patterns within and between variables, and learn why cross-validation is so important in stochastic models.
- Variation analysis (20 min) — variance as spread of a single variable.
- Covariation analysis (20 min) — covariance and correlation between variables.
- Cross validation (40 min) — hold-out and k-fold estimates of out-of-sample error.
Demo: contrast training error with cross-validated error to see overfitting appear as model complexity grows.
The course's largest module. Opens with the midterm, then develops linear regression end to end: from a single predictor through multiple predictors, residual diagnostics, categorical and interaction terms, polynomial effects, and finally variable selection and cross-validation — closing with the two-session linear regression lab.
- Fit and interpret simple and multiple linear regressions and their key statistics (slope, R², p-values).
- Diagnose a model through its residuals and check the assumptions needed to generalize.
- Incorporate categorical variables, interactions and polynomial terms appropriately.
- Select variables and apply cross-validation to balance under- and over-fitting.
Midterm Exam exam
Test covering all course content to date (Modules I–V). Format detailed by the professor well in advance.
Simple Linear Regression I live
Define what a regression problem is and explain the most basic aspects of Simple Linear Regression, the most popular technique to solve it.
- Simple LR introduction (20 min) — regression vs. classification; predicting a continuous target.
- Theoretical demonstration (20 min) — deriving the line that minimizes squared error.
Demo: drag points and watch the least-squares line and its residuals update live.
Simple Linear Regression II live
The most important statistics for interpreting the results of a Simple LR.
- Simple LR with simulated data (20 min).
- Theoretical interpretation (20 min) — what each statistic means.
- Practical interpretation (20 min) — reading the output table.
- Implementation in R (20 min).
Demo: see how R² responds as the cloud of points tightens around the line.
Multiple Linear Regression live
Include more than one variable in a linear regression model.
- MLR implementation in Python (40 min).
- Interpreting MLR results (40 min) — and the multicollinearity problem.
Demo: the least-squares demo grounds the same minimization principle extended to several predictors.
Residuals and Assumptions live
Analyzing residuals is essential to judging model quality and checking the assumptions needed to generalize results.
- The error component (40 min) — interpreting residuals eᵢ = yᵢ − ŷᵢ.
- Assumptions (40 min) — linearity, independence, homoscedasticity, normal errors.
Categorical Variables and Interaction Effects live
Beyond continuous main effects, include and interpret categorical variables and interactions in LR models.
- Categorical variables (40 min) — dummy variables vs. factors and how to read their coefficients.
- Interaction effects (40 min) — when the effect of one variable depends on another.
Polynomial Regression live
When slope and intercept are not enough, use second-, third- and higher-order effects to improve predictions.
- Identifying polynomial effects (20 min) — curvature visible in the residuals.
- Implementing in Python (30 min).
- Interpreting results (30 min).
Demo: raise the polynomial degree and watch the curve flex from underfit to overfit.
Variable Selection and Cross Validation live
Navigate between overfitting and underfitting: select optimal variables and apply cross-validation for the best fit.
- Variable selection (40 min) — ANOVA, best-subset regression, stepwise selection.
- Cross validation in a LR model (40 min).
Demo: the U-shaped cross-validation error curve makes the bias–variance trade-off concrete.
Linear Regression Lab lab
Groups complete a guided lab applying linear regression.
- Lab explanation and delivery instructions (10 min).
- Guided lab completion (70 min).
Lab 3 of 4 (part of the 25%): build and validate a linear regression model; continues in Session 23.
Linear Regression Lab (continuation) lab
Groups continue the guided linear regression lab.
- Guided lab completion (80 min).
Lab 3 continued: finalize and submit via Turnitin before the next synchronous class.
When the target is categorical rather than continuous, regression logic breaks down. This module derives logistic regression from linear regression via the logit transformation, implements it in Python, validates classifiers with the confusion matrix and ROC curve, runs the final lab, and ends with the comprehensive final exam.
- Explain why linear regression fails for categorical targets and how the logit fixes it.
- Fit and interpret a logistic regression (coefficients as log-odds, odds ratios).
- Choose a cutoff and evaluate a classifier with the confusion matrix and ROC/AUC.
From Linear Regression to Logistic Regression I live
When the target has discrete nominal values, LR logic no longer applies — understand the mathematics behind logistic regression.
- Linear vs. logistic regression (20 min) — predicting a probability, not a level.
- Mathematics behind logistic regression (30 min) — the sigmoid and maximum likelihood.
Demo: shift β₀, β₁ and watch the S-curve slide and steepen.
From Linear Regression to Logistic Regression II live
The logit transformation and how to read the key statistics of a logistic model.
- The logit transformation (40 min) — linearizing the model in log-odds.
- Interpreting key statistics (40 min).
Demo: connects the visible S-curve to the linear log-odds scale behind it.
Logistic Regression Implementation in Python live
Implement a logistic regression model in Python and interpret its results.
- Implementation in Python (40 min) — fitting with scikit-learn / statsmodels.
- Results interpretation (40 min) — coefficients, odds ratios, significance.
Demo: mirrors the fitted-probability curve that Python produces from real data.
Logistic Regression Validation live
Classification targets need different validation tools — learn the statistics that validate classification models.
- Confusion matrix and cutoff point (40 min) — TP/FP/TN/FN, accuracy, precision, recall.
- ROC curves (40 min) — trade-off across all cutoffs; AUC as a summary.
Demo: slide the cutoff and watch the confusion matrix and the point on the ROC curve move together.
Logistic Regression Lab lab
Groups complete a guided lab applying logistic regression.
- Lab explanation and delivery instructions (10 min).
- Guided lab completion (70 min).
Lab 4 of 4 (part of the 25%): fit and validate a classifier; continues in Session 29.
Logistic Regression Lab (continuation) lab
Groups continue the guided logistic regression lab.
- Guided lab completion (80 min).
Lab 4 continued: finalize and submit via Turnitin before the next synchronous class.
Final Exam exam
Comprehensive test covering all course content. Format detailed by the professor well in advance.
Key concepts
A working glossary of the terms used across the seven modules. Each entry is the short definition you should be able to give from memory by the final exam.
- System
- A set of interacting elements whose collective behavior we want to understand or predict.
- Model
- A simplified representation of a system, typically Y = f(X) + ε — structure plus noise.
- Simulation
- Running a model forward in virtual time to observe behavior without testing the real system.
- Stochastic vs. deterministic
- Stochastic models contain randomness and give different outputs per run; deterministic ones always repeat.
- Pseudo-random number
- A deterministically generated value that behaves statistically like a true random draw.
- Linear Congruential Generator (LCG)
- Generator Xn+1 = (aXn+c) mod m; classic source of uniform pseudo-randomness.
- Period
- How many values a generator produces before the sequence repeats; at most m for an LCG.
- Uniformity / independence tests
- Checks (chi-square, KS, runs, autocorrelation) that a generator's output is flat and pattern-free.
- Inverse-transform method
- Sampling rule X = F⁻¹(U) that turns a uniform draw into one from any target distribution.
- Monte Carlo simulation
- Estimating quantities by repeated random sampling; error typically shrinks as 1/√N.
- Inferential statistics
- Drawing conclusions about a population from a sample, often via a sampling distribution.
- Sampling distribution
- The distribution of an estimator across many samples; the basis for confidence intervals.
- Discrete-Event Simulation (DES)
- Advancing a system clock event-to-event rather than continuously.
- SimPy
- Python library for process-based DES using environments, processes and resources.
- M/M/1 queue
- Single-server queue with Poisson arrivals and exponential service; stable when ρ = λ/μ < 1.
- Utilization (ρ)
- Fraction of time a server is busy, λ/μ; values near 1 cause long queues.
- Cross-validation
- Estimating out-of-sample error by training and testing on rotating data folds.
- Overfitting / underfitting
- Fitting noise (too complex) vs. missing structure (too simple); CV finds the balance.
- Ordinary Least Squares (OLS)
- Fitting a linear model by minimizing the sum of squared residuals.
- R²
- Proportion of variance in the target explained by the model, between 0 and 1.
- Residual
- Observed minus predicted, eᵢ = yᵢ − ŷᵢ; their pattern reveals model adequacy.
- Multicollinearity
- Predictors that are strongly correlated, making individual coefficients unstable (high VIF).
- Dummy variable / interaction
- Encoding a category (shifts intercept) and a product term (changes slope across groups).
- Logistic regression
- Classification model p = 1/(1+e^−(β₀+β₁x)) for a binary target.
- Logit / odds ratio
- Log-odds ln(p/(1−p)) is linear in x; e^β is the odds ratio.
- Confusion matrix
- Table of TP, FP, TN, FN summarizing classifier performance at a chosen cutoff.
- ROC curve / AUC
- Recall vs. false-positive rate across cutoffs; AUC summarizes ranking quality.
Bibliography
Recommended reading, annotated with the sessions each source supports.
Beyond the recommended text, the course relies on the official IE policies referenced in the syllabus — the Code of Conduct, Attendance Policy and Ethics Code — and on Python (with SimPy and scikit-learn / statsmodels) plus R as the working tools for the labs.