Course overview
FDA-N-CSAI.1.M.A — a basic Mathematics-area course in the Bachelor in Computer Science and Artificial Intelligence (BCSAI). It introduces probability models and statistical methods for analysing data: students learn to make inferences from statistics (functions of observed data), build confidence intervals, test hypotheses with one and two samples, run analysis of variance with one or more factors, and handle categorical data — using Python throughout.
The use of probability models and statistical methods for analysing data is common practice in virtually all scientific disciplines. Data analysis teaches us to make intelligent judgments and informed decisions in the presence of uncertainty and variation: if every component of a type had exactly the same lifetime, or human behaviour always led to the same decision, a single observation would reveal everything and statistics would be unnecessary. Because that is never the case, we need methods that not only analyse the results of experiments once carried out but also suggest how to run experiments efficiently — mitigating the effects of variation to give a better chance of correct conclusions. This course is the foundation for later subjects such as Algorithms and Data Structures, Probability for Computer Science, and AI: Statistical Learning and Prediction.
Concretely, students learn to make inferences from statistics — functions of observed data — proceeding from describing samples, to estimating parameters, to quantifying uncertainty with confidence intervals, to deciding between hypotheses, and finally to comparing several groups with analysis of variance. Python is used throughout as the working tool. The professor, Simón Isaza, holds a PhD in mathematical research (Complutense University of Madrid, cum laude, on topology and the theory of singularities) and has taught and researched at the National University of Colombia, Complutense and IE University, with prior experience as a financial-risk consultant at KPMG. Office hours are on request by e-mail.
AI policy. The use of generative AI is not permitted in this course unless the instructor states otherwise; GenAI tools would jeopardise the acquisition of the fundamental knowledge and skills the course develops. AI-generated content in any assessment is treated as academic misconduct.
Learning objectives
By the end of the course students should acquire the following competencies — and reinforce a set of generic skills.
Subject competencies
- Analyse and synthesise the main information content of univariate and multivariate data.
- Compute probabilities and understand key concepts related to hypothesis testing.
- Use random variables to model real phenomena.
- Perform inferences in one and two populations.
- Test hypotheses about populations.
- Design experiments and run analysis of variance.
- Deal with categorical data.
Generic skills
- The ability to think analytically.
- The use of statistical software and a programming language, namely Python.
- The ability to think critically.
The objective is to give students the tools to delve into data sets and use that information across disciplines — computer science, engineering, physics, and more.
Teaching methodology
IE University's method is collaborative, active and applied: students build their knowledge through lectures, discussion, in-class exercises and Python field work, group work and individual study. Weighting of learning activities (total 150 hours):
How sessions work. The 30 sessions alternate theory blocks that introduce each topic with practice blocks — by-hand problem solving, Python labs, and four graded problem sets. Each topic is typically taught over two to three theory sessions and then consolidated in a problem-solving session and a Python session, so concepts are seen, worked by hand, and implemented in code before being assessed.
Evaluation criteria
Ordinary evaluation combines the following criteria. A minimum grade of 3.5 in the final exam is required to pass — below that the course is failed even if the weighted average exceeds 5.0.
Class participation — 10%
Deliverable: active, sustained contributions across sessions. Evaluation: assessed first on quantity (a threshold number of contributions sufficient for a reliable judgment) and then on quality — depth of insight, rigorous use of evidence, consistency of argument and realism, with comments that are concise, well-timed and engaged with the discussion.
Problem sets (×4) — 20%
Deliverable: four problem sets (sessions 11, 18, 25, 28) of theory and practice questions, solved by hand and/or in Python and submitted on BlackBoard as a multiple-answer test where roughly one third of options are correct. Evaluation: reinforce understanding of complex statistical concepts and prepare students for the midterm, computer and final exams.
Midterm exam — 20%
Deliverable: exam in session 20 (tentative — may move to session 18 or 19), via BlackBoard Ultra, covering Module 1 (Topics 1–4). Evaluation: format similar to the problem sets; closed book; one double-sided A4 handwritten formula sheet and a simple (non-programmable, non-graphical) calculator allowed; connect via the IE network or the grade is 0.
Computer exam (Python) — 20%
Deliverable: individual Python exam in session 29 — solve and discuss questions covering all course topics. Evaluation: open book; assesses both understanding of the material and competence in Python; communication is forbidden and the IE network is required.
Final exam — 30%
Deliverable: comprehensive exam in session 30 covering every session, similar in style to the problem sets with theory and practice questions; draws on the book, slides, problem sets and class notes. Evaluation: closed book; two double-sided A4 handwritten formula sheets and a simple calculator allowed.
Pass rule. A minimum of 3.5/10 in the final exam is required; below it the course is failed even if the weighted average exceeds 5.0.
Attendance. Students who do not meet the 80% attendance rule fail both the ordinary and extraordinary calls for the year and must re-enrol the following year.
Re-sit / re-take. Each course allows four chances across two academic years. Students failing the ordinary call may re-sit in June/July with a single comprehensive exam (continuous evaluation is not carried over), requiring physical presence in Segovia or Madrid; the re-sit grade is capped at 8.0 (notable). The retake (3rd call) caps at 10.0; retakers must check the assigned professor's criteria. Failing more than 18 ECTS in a year after the re-sits may lead to leaving the program. Grade appeals require prior attendance at the exam review session. GenAI use in any assessment is academic misconduct and may mean failing the assignment or the course.
Program — 30 sessions
The full session-by-session program, grouped into the two assessment modules. Theory sessions introduce each topic; practice sessions are problem solving and Python labs. Where a topic has a live demo on the interactive lab, the session links to it.
Module 1 builds the inferential pipeline from the ground up. It starts with how a statistic behaves across random samples (sampling distributions and the Central Limit Theorem), uses that to produce single best-guess values for parameters (point estimation), attaches a measure of precision to those guesses (confidence intervals), and finally turns estimation into a decision procedure (one-sample hypothesis tests). Everything here concerns a single population.
On completing Module 1 you should be able to:
- Describe and simulate the sampling distribution of a statistic and apply the Central Limit Theorem.
- Compute point estimates by the method of moments and by maximum likelihood, and judge their quality.
- Construct and interpret confidence intervals for a mean, proportion, variance and standard deviation.
- Carry out z, t, proportion and variance tests for a single sample and reason about Type I/II error and power.
- Implement one-sample estimation and testing in Python.
Course presentation, then the behaviour of statistics computed from random samples and how a sampling distribution arises.
- Statistics and their distributions
- Random samples
- Deriving a sampling distribution
- Simulation experiments
- Distribution of the sample mean
- Central Limit Theorem
- Distribution of a linear combination
Continue with the sample mean, the CLT and linear combinations; consolidate via simulation.
- Distribution of the sample mean
- Central Limit Theorem
- Distribution of a linear combination
Use a sample to produce a single best-guess value (point estimate) for a population parameter.
- General concepts of point estimation
- Methods of point estimation
- Method of moments
- Maximum likelihood estimation
Develop the estimation methods and compare them.
- Method of moments
- Maximum likelihood estimation
- Properties of estimators
Wrap up point estimation; contrast the sampling behaviour of competing estimators.
- Method of moments
- Maximum likelihood estimation
Problem solving across Topics 1 and 2.
- Problem solving
Report a range of plausible values — a confidence interval — to convey the precision of an estimate.
- Basic properties of confidence intervals
- Large-sample CIs for a mean & proportion
- Intervals based on a Normal population
- CIs for variance & standard deviation
Continue with confidence intervals for means, proportions and variances.
- Large-sample CIs
- Normal-based intervals
- CIs for variance & SD
Problem solving on confidence intervals.
- Problem solving
Implement one-sample estimation and intervals in Python.
- Python lab
- One-sample inference
Assessed problem set covering Topics 1–3 (BlackBoard multiple-answer test).
- Problem Set 1
Decide between two contradictory claims about a parameter — the core of hypothesis testing.
- Hypotheses & test procedures
- z tests for a population mean
- One-sample t test
- Tests for a proportion
- Tests for a variance
- Further aspects of testing
Continue with single-sample tests and the trade-offs between error types and power.
- z and t tests
- Tests for proportion & variance
- Type I/II error & power
Problem solving on one-sample hypothesis tests.
- Problem solving
Run one-sample hypothesis tests in Python.
- Python lab
- One-sample hypothesis testing
Confidence intervals and tests for a difference between two population parameters.
- z tests & CIs for a difference of means
- Two-sample t test & CI
- Analysis of paired data
- Difference of proportions
- Ratio of two variances
Continue with two-sample tests, paired data and variance comparisons.
- Two-sample t test & CI
- Analysis of paired data
- Two population variances
Assessed problem set covering one- and two-sample inference.
- Problem Set 2
Final two-sample session before the midterm.
- Difference of means
- Paired data
- Difference of proportions
Closed-book exam via BlackBoard Ultra. Date tentative — may fall in session 18 or 19 depending on class pace. One double-sided A4 formula sheet and a simple calculator allowed.
- Sampling distributions
- Point estimation
- Confidence intervals
- One-sample testing
Module 2 generalises inference from one population to many. It first compares two populations (differences of means, paired data, differences of proportions, ratios of variances), then compares several groups at once with single-factor ANOVA, and finally studies the simultaneous effect of two or more factors with multifactor ANOVA. The module closes with the comprehensive computer and final exams.
On completing Module 2 you should be able to:
- Build confidence intervals and tests for a difference between two population means and proportions.
- Recognise paired designs and analyse them with the appropriate paired procedure.
- Compare two variances and set up an F-based comparison.
- Decompose total variability and run a single-factor ANOVA, including multiple comparisons.
- Extend ANOVA to two factors, including fixed-, random- and mixed-effects and randomized block designs.
- Carry out all of the above in Python and interpret the output.
Problem solving on two-sample inference.
- Problem solving
Two-sample tests and confidence intervals implemented in Python.
- Python lab
- Two-sample inference
One-way ANOVA: comparing quantitative responses across more than two populations or treatments.
- Single-factor ANOVA
- Multiple comparisons in ANOVA
- More on single-factor ANOVA
Run a one-way ANOVA in Python and interpret the F test.
- Python lab
- Single-factor ANOVA
Assessed problem set covering two-sample inference and single-factor ANOVA.
- Problem Set 3
Extend ANOVA to two or more simultaneous factors.
- Two-factor ANOVA
- The fixed-effects model
- Randomized block experiments
- Random & mixed-effects models
Fit and interpret a multifactor ANOVA in Python.
- Python lab
- Two-factor ANOVA
Final assessed problem set covering multifactor ANOVA.
- Problem Set 4
Individual, open-book Python exam: solve and discuss questions covering all topics of the course, evaluating both understanding and Python use.
- Open book · Python
- All topics
Comprehensive closed-book exam covering all content from the first to the last session, similar in style to the problem sets with theory and practice questions. Minimum 3.5 required to pass; two double-sided A4 formula sheets and a simple calculator allowed.
- All topics 1–7
- Closed book
Bibliography & key concepts below. Each topic above is anchored in the primary textbook; the glossary and annotated bibliography that follow collect the recurring terms and reading map for the whole course.
Key concepts
A quick-reference glossary of the terms that recur throughout the program — useful when building your A4 formula sheet for the exams.
- Population & parameter
- The full set of units of interest and a fixed numerical feature of it (e.g. mean μ, proportion p, variance σ²) that we try to learn about.
- Random sample
- Observations X₁,…,Xₙ drawn independently and identically (i.i.d.) from the population.
- Statistic
- Any quantity computed from the sample (e.g. X̄, s²); being a function of random data, it is itself random.
- Sampling distribution
- The probability distribution of a statistic across all possible samples — the object inference is built on.
- Standard error
- The standard deviation of a statistic's sampling distribution; for the mean, SE = σ/√n
- Central Limit Theorem
- For large n the standardized sample mean is approximately N(0,1) whatever the population shape.
- Point estimator / estimate
- A rule θ̂ for guessing a parameter, and the single value it returns on the observed sample.
- Bias & unbiasedness
- bias = E[θ̂] − θ; an estimator is unbiased when its expected value equals the parameter.
- Method of moments (MoM)
- Estimate parameters by equating sample moments to population moments and solving.
- Maximum likelihood (MLE)
- Estimate by maximizing the likelihood of the observed data; usually efficient and asymptotically normal.
- Consistency
- An estimator is consistent if it converges to the true parameter as n grows.
- Confidence interval
- A random interval covering the parameter with a stated long-run probability (the confidence level).
- Confidence level
- 1 − α, the proportion of such intervals that contain the parameter over repeated sampling (e.g. 95%).
- Null & alternative hypotheses
- H₀ is the default claim; Hₐ is the competing claim the data may support.
- Test statistic
- A standardized quantity (z, t, χ², F) measuring how far data fall from H₀.
- p-value
- Probability, under H₀, of a result at least as extreme as observed; reject H₀ when p ≤ α.
- Significance level α
- The chosen Type I error rate — the probability of rejecting a true H₀.
- Type I & Type II error
- Rejecting a true H₀ (prob. α) vs. failing to reject a false H₀ (prob. β).
- Power
- 1 − β, the probability of correctly rejecting a false H₀; rises with effect size, n and α.
- t distribution
- Bell-shaped, heavier-tailed than normal, used when σ is estimated by s; indexed by degrees of freedom.
- Degrees of freedom
- The number of independent pieces of information available to estimate variability (e.g. n − 1).
- Paired data
- Two measurements on the same unit; analysed through the within-pair differences.
- ANOVA
- Analysis of variance — compares group means by partitioning total variability into between- and within-group parts.
- F statistic
- Ratio of two variance estimates (MSTr/MSE in ANOVA, s₁²/s₂² for two variances); large F signals real differences.
- Interaction
- In multifactor ANOVA, when the effect of one factor depends on the level of another.
- Fixed vs. random effects
- Factor levels deliberately chosen (fixed) vs. sampled from a population of levels (random); a mix is a mixed model.
- Multiple comparisons
- Post-hoc procedures (e.g. Tukey's HSD) that locate which group means differ while controlling family-wise error.
Annotated bibliography
The course is anchored on Devore's textbook (referenced as DEV in the syllabus), supported by the slides, problem sets and class notes. Chapter mapping below indicates where each source feeds the program.
- Jay L. Devore — Probability and Statistics for Engineering and the Sciences (DEV). The primary textbook for every topic: theory, worked examples and exercises that mirror the problem sets and exams. all sessions · primary reference
- DEV, Chapter 5 §5.3–5.5 — Statistics & their distributions; sampling distribution of the mean & CLT; linear combinations. The explicitly suggested reading for Topic 1. Topic 1 · sessions 1–2, 6
- DEV, Chapter 6 — Point estimation. General concepts, the method of moments and maximum likelihood, and estimator properties. Topic 2 · sessions 3–6
- DEV, Chapter 7 — Statistical intervals for a single sample. Confidence intervals for a mean, proportion, variance and standard deviation. Topic 3 · sessions 7–10
- DEV, Chapter 8 — Tests of hypotheses (single sample). z and t tests, proportion and variance tests, errors, power and p-values. Topic 4 · sessions 12–15
- DEV, Chapter 9 — Inferences based on two samples. Differences of means and proportions, paired data, and comparison of two variances. Topic 5 · sessions 16–22
- DEV, Chapter 10 — Single-factor ANOVA. One-way ANOVA, the F test and multiple-comparison procedures. Topic 6 · sessions 23–25
- DEV, Chapter 11 — Multifactor ANOVA. Two-factor ANOVA, fixed/random/mixed effects and randomized block designs. Topic 7 · sessions 26–28
- Course slides, problem sets & class notes. Companion materials; the final exam may draw on all of them, so the syllabus advises studying them alongside the book. all sessions · exam preparation
- Python ecosystem — NumPy, Pandas, SciPy stats, statsmodels. The working tools for every practice/Python lab and the open-book computer exam. sessions 10, 15, 22, 24, 27, 29