1. Population, sample & sampling error
A population of dots with a true proportion of "successes". Draw a random sample and the estimate $\hat p$ jiggles around the truth. As $n$ grows the sampling error $\hat p - p$ shrinks like $1/\sqrt{n}$. Biased sampling instead drifts to a systematically wrong value.
2. Sample size vs margin of error
For a proportion at confidence level $C$, the margin of error is $$E = z\sqrt{\dfrac{\hat p(1-\hat p)}{n}}.$$ The curve shows how $E$ falls as $n$ grows — quartering the error needs roughly $16\times$ the sample. A $95\%$ confidence interval $\hat p \pm E$ is drawn for the current $n$.
3. Correlation & scatterplots
Drag the slider to set the target correlation $r$ and resample a cloud of points. The least-squares line and Pearson $r$ update live. A strong line is not proof of cause — see the next two demos for why.
4. Confounding & randomization
A confounder $Z$ affects both the treatment assignment and the outcome, faking a treatment effect. Turn on randomization to assign groups by coin flip — this breaks the $Z\rightarrow$ treatment arrow, so the naive difference collapses toward the true effect.
5. Simpson's paradox
Two subgroups each show a positive trend, yet pooling them can reverse the direction. Slide the group separation to see the aggregate line flip against the within-group lines — a vivid third-variable warning.
6. Two-group experiment & effect size
A control and a treatment group with adjustable means and spread. The standardized effect size is Cohen's $d=\dfrac{\bar x_T-\bar x_C}{s}$, and a two-sample test gives the $t$ statistic and an approximate $p$-value.
7. Type I & Type II error
Two sampling distributions: $H_0$ (no effect) and $H_1$ (true effect). The decision threshold at significance $\alpha$ splits the picture into $\alpha$ (false positive), $\beta$ (false negative) and power $1-\beta$.
8. p-hacking & multiple comparisons
When nothing is real, each test is a $p<\alpha$ false positive with probability $\alpha$. Run $k$ independent tests and the chance of at least one "significant" result is $1-(1-\alpha)^k$ — testing many things almost guarantees a fluke.
9. Survey question bias (Likert)
The same question, asked neutrally or with a leading / loaded framing, shifts the response distribution. Watch the 5-point Likert bars and the mean response slide as you change the wording bias and acquiescence (yea-saying) tendency.
10. Reliability vs validity
The classic dartboard analogy. Reliability is consistency (tight cluster); validity is hitting the true target (bullseye). A measure can be reliable but biased, valid but noisy, or — the goal — both.