data-analysis-lab · visual statistical inference

1. Sampling distribution of the mean & the CLT

Draw repeated samples of size $n$ from a chosen population and pile up each sample mean $\bar X$. The histogram of means is the sampling distribution; it becomes ever more Normal and narrows with standard error $\mathrm{SE}=\sigma/\sqrt{n}$, regardless of the population shape (the CLT).

population sample size $n$ 10

population $\mu,\sigma$—

theory SE $=\sigma/\sqrt n$—

means drawn0

observed mean of $\bar X$—

observed SD of $\bar X$—

2. Maximum likelihood estimation

For a fixed sample, slide the parameter and watch the log-likelihood $\ell(\theta)=\sum_i \log f(x_i;\theta)$. The maximiser is the MLE $\hat\theta$. For the exponential, $\hat\lambda=1/\bar x$; for the Normal mean, $\hat\mu=\bar x$.

model parameter guess 1.00 sample size $n$ 40

analytic MLE $\hat\theta$—

$\ell$ at your guess—

3. Method of moments vs. MLE

Two estimators on the same sample. The method of moments equates sample moments to population moments; the MLE maximises the likelihood. Repeatedly resample to compare their sampling spread — for the Gamma shape they differ, for the exponential rate they coincide ($1/\bar x$).

true model sample size $n$ 30

true parameter—

MoM estimate—

MLE estimate—

MoM RMSE (rep.)—

MLE RMSE (rep.)—

4. Confidence-interval coverage

Each horizontal bar is a 95% CI $\bar x \pm t_{0.025,n-1}\,s/\sqrt n$ from one fresh sample. Bars that miss the true mean $\mu$ are drawn in red. Over many samples the covered fraction approaches the stated confidence level — the meaning of “95% confident”.

confidence level 95% sample size $n$ 15

intervals drawn0

covered0

observed coverage—

5. CI width vs. sample size & confidence

The half-width of a mean CI is $t_{\alpha/2,\,n-1}\,s/\sqrt n$. It shrinks like $1/\sqrt n$ (quadrupling $n$ halves the width) and grows as you demand more confidence. The curve plots half-width against $n$ for the chosen level.

confidence level 95% sample SD $s$ 1.0 highlight $n$ 20

$t_{\alpha/2,n-1}$—

half-width at $n$—

$n$ to halve it—

6. One-sample z / t test

Test $H_0:\mu=\mu_0$. The statistic is $z=\dfrac{\bar x-\mu_0}{\sigma/\sqrt n}$ (known $\sigma$) or $t=\dfrac{\bar x-\mu_0}{s/\sqrt n}$ (estimated). The shaded rejection region holds the most extreme $\alpha$ of the null distribution; reject when the statistic lands inside it.

test type alternative $\bar x - \mu_0$ 0.6 $s$ (or $\sigma$) 1.0 sample size $n$ 25 $\alpha$ 0.05

statistic—

critical value—

p-value—

decision—

7. The p-value as a tail area

Given an observed test statistic on the standard Normal null, the p-value is the probability of a result at least as extreme. Drag the statistic and the alternative; the shaded area is the p-value. Small tail area ⟹ the data are surprising under $H_0$.

alternative observed $z$ 1.96

tail area (p)—

vs α = 0.05—

vs α = 0.01—

8. Type I / II error & statistical power

Two distributions of $\bar X$: under $H_0$ (centre $\mu_0$) and under $H_1$ (centre $\mu_0+\delta$). The critical line splits them. Right of it under $H_0$ is $\alpha$ (Type I); left of it under $H_1$ is $\beta$ (Type II); power $=1-\beta$ is the rest. Move effect size, $n$ and $\alpha$ to grow the power.

effect size $\delta/\sigma$ 0.5 sample size $n$ 25 $\alpha$ 0.05 two-sided

$\alpha$ (Type I)—

$\beta$ (Type II)—

power $1-\beta$—

9. Two-sample t-test (pooled vs. Welch)

Compare two group means. The pooled t assumes equal variances; Welch's t does not and adjusts the degrees of freedom (Satterthwaite). When variances or sizes differ, the two disagree — Welch is the safer default. Edit each group's mean, SD and size.

group A: mean 50 group A: SD 10 group A: n 20 group B: mean 56 group B: SD 14 group B: n 20

pooled t (df)—

Welch t (df)—

Welch p (two-sided)—

10. Paired vs. unpaired analysis

The same before/after data, two ways. Pairing analyses the within-subject differences $d_i$, removing between-subject variation; the unpaired test ignores the pairing. When subjects vary a lot but the change is consistent, pairing shrinks the standard error and lifts the t statistic.

true shift (after − before) 3.0 between-subject spread 12 within-pair noise 4 pairs $n$ 15

paired t—

paired p—

unpaired t—

unpaired p—

11. One-way ANOVA — F = between / within

Several groups. ANOVA splits total variation into between-group (MSB) and within-group (MSE) parts; the ratio $F=\mathrm{MSB}/\mathrm{MSE}$ is large when group means are spread far relative to the noise. Move the group means apart or raise the noise and watch $F$ and its p-value react.

groups $k$ 3 per-group size $n$ 12 mean spread 6 within noise 8

df (between, within)—

MSB / MSE—

F statistic—

p-value—

12. Chi-square test of independence

A contingency table of observed counts $O_{ij}$. Under independence the expected count is $E_{ij}=\dfrac{R_i\,C_j}{N}$, and $\chi^2=\sum\dfrac{(O-E)^2}{E}$ with $(r-1)(c-1)$ degrees of freedom. Edit the cells; the expected table, contributions and p-value update live.

χ² statistic—

df—

p-value—

at α = 0.05—