research-methods-lab · observe, experiment & survey, made visual

1. Population, sample & sampling error

A population of dots with a true proportion of "successes". Draw a random sample and the estimate $\hat p$ jiggles around the truth. As $n$ grows the sampling error $\hat p - p$ shrinks like $1/\sqrt{n}$. Biased sampling instead drifts to a systematically wrong value.

sample size $n$ 40 true proportion $p$ 0.50 biased sampling (corner-favoring)

estimate $\hat p$—

error $\hat p - p$—

std. error—

2. Sample size vs margin of error

For a proportion at confidence level $C$, the margin of error is $$E = z\sqrt{\dfrac{\hat p(1-\hat p)}{n}}.$$ The curve shows how $E$ falls as $n$ grows — quartering the error needs roughly $16\times$ the sample. A $95\%$ confidence interval $\hat p \pm E$ is drawn for the current $n$.

sample size $n$ 600 sample proportion $\hat p$ 0.50 confidence $C$

$z^\ast$—

margin $E$—

interval—

3. Correlation & scatterplots

Drag the slider to set the target correlation $r$ and resample a cloud of points. The least-squares line and Pearson $r$ update live. A strong line is not proof of cause — see the next two demos for why.

target correlation $r$ 0.70 points $n$ 60

measured $r$—

$r^2$ (variance shared)—

slope—

4. Confounding & randomization

A confounder $Z$ affects both the treatment assignment and the outcome, faking a treatment effect. Turn on randomization to assign groups by coin flip — this breaks the $Z\rightarrow$ treatment arrow, so the naive difference collapses toward the true effect.

true treatment effect 0.0 confounder strength 0.7 randomize assignment

naive difference—

vs true effect—

5. Simpson's paradox

Two subgroups each show a positive trend, yet pooling them can reverse the direction. Slide the group separation to see the aggregate line flip against the within-group lines — a vivid third-variable warning.

group separation 60 show groups separately

group A slope—

group B slope—

pooled slope—

6. Two-group experiment & effect size

A control and a treatment group with adjustable means and spread. The standardized effect size is Cohen's $d=\dfrac{\bar x_T-\bar x_C}{s}$, and a two-sample test gives the $t$ statistic and an approximate $p$-value.

mean difference 0.6 spread (std dev) 1.0 per-group $n$ 30

Cohen's $d$—

$t$—

$p$-value—

7. Type I & Type II error

Two sampling distributions: $H_0$ (no effect) and $H_1$ (true effect). The decision threshold at significance $\alpha$ splits the picture into $\alpha$ (false positive), $\beta$ (false negative) and power $1-\beta$.

effect size (separation) 2.5 significance $\alpha$

$\alpha$ (type I)—

$\beta$ (type II)—

power $1-\beta$—

8. p-hacking & multiple comparisons

When nothing is real, each test is a $p<\alpha$ false positive with probability $\alpha$. Run $k$ independent tests and the chance of at least one "significant" result is $1-(1-\alpha)^k$ — testing many things almost guarantees a fluke.

number of tests $k$ 20 significance $\alpha$

$P(\ge 1$ false pos$)$—

this run: "significant"—

9. Survey question bias (Likert)

The same question, asked neutrally or with a leading / loaded framing, shifts the response distribution. Watch the 5-point Likert bars and the mean response slide as you change the wording bias and acquiescence (yea-saying) tendency.

wording acquiescence bias 0.20 respondents 200

mean response—

% agree (4–5)—

10. Reliability vs validity

The classic dartboard analogy. Reliability is consistency (tight cluster); validity is hitting the true target (bullseye). A measure can be reliable but biased, valid but noisy, or — the goal — both.

reliability (1 − noise) 0.70 validity (1 − bias) 0.70 shots 25

spread (scatter)—

bias (off-center)—

verdict—