1. Modeling change — a dynamic system
A model is a rule for how a quantity changes over time. Population $P$ grows logistically: $\frac{dP}{dt}=rP\left(1-\frac{P}{K}\right)$. Small populations grow nearly exponentially; growth slows as $P$ approaches the carrying capacity $K$. We integrate the rule forward with small time steps — the essence of simulation.
2. Random numbers — the linear congruential generator
Computers make "random" numbers deterministically. The LCG iterates $x_{n+1}=(a\,x_n+c)\bmod m$ and returns $u_n=x_n/m\in[0,1)$. Good constants fill the unit square; bad ones collapse to a few lattice lines. The sequence always repeats after at most $m$ steps — its period.
3. Generating random variables — inverse transform
To sample a distribution from uniform $U\sim\mathrm{Unif}(0,1)$, invert its CDF: $X=F^{-1}(U)$. For the exponential, $X=-\frac{1}{\lambda}\ln(1-U)$. We draw $U$, push it through $F^{-1}$, and the histogram of samples converges to the target density.
4. Monte Carlo simulation — estimating $\pi$
Throw random darts into the unit square. The fraction landing inside the quarter circle approaches $\frac{\pi}{4}$, so $\hat\pi=4\cdot\frac{\text{hits}}{\text{throws}}$. Error shrinks like $1/\sqrt{n}$ — the signature convergence rate of Monte Carlo.
5. Monte Carlo inference — the sampling distribution
Repeatedly draw a sample of size $n$ from a population, compute its mean, and plot the means. The Central Limit Theorem says this sampling distribution is approximately normal with spread $\sigma/\sqrt{n}$ — wider for small $n$, tighter for large $n$. This is how Monte Carlo lets us reason about estimators.
6. Discrete-event simulation — an M/M/1 queue
Customers arrive at rate $\lambda$ and a single server works at rate $\mu$. The simulation jumps between discrete events (arrival, departure). When traffic intensity $\rho=\lambda/\mu$ nears 1 the queue explodes; theory predicts mean queue length $L_q=\frac{\rho^2}{1-\rho}$.
7. Simple linear regression — least squares
Fit $\hat y=\beta_0+\beta_1 x$ by minimising the sum of squared residuals. Drag the generating slope and noise; the fitted line and $R^2$ update. Click the canvas to add your own points. $R^2$ is the share of variance the line explains.
8. Polynomial regression — fitting curvature
A straight line cannot capture a curved trend. Raising the degree $\hat y=\sum_{k=0}^{d}\beta_k x^k$ lets the fit bend. Watch training error fall as degree rises — but a very high degree starts chasing noise, foreshadowing overfitting.
9. Overfitting & cross-validation
Splitting data into train and test reveals the bias–variance trade-off. As model complexity rises, training error keeps falling, but test error turns back up once the model memorises noise. The sweet spot is the bottom of the test curve.
10. Logistic regression — the logit curve
For a yes/no outcome we model the probability with the sigmoid $p(x)=\dfrac{1}{1+e^{-(\beta_0+\beta_1 x)}}$. The slope $\beta_1$ controls how sharply the curve switches; the points are 0/1 labels. A cutoff turns probabilities into predictions.
11. Confusion matrix & ROC curve
Sliding the classification cutoff trades false positives against false negatives. The confusion matrix counts TP/FP/FN/TN; the ROC curve plots the true-positive rate against the false-positive rate across all cutoffs. The area under it (AUC) summarises the classifier.