Worked example: screen-time & focus — a mixed-methods study, start to finish
This page is a single, fully worked research project that threads together every method in the course: framing a testable question, operationalizing fuzzy constructs, drawing a representative sample, writing a clean questionnaire, designing a quasi-experiment, and analyzing a mock dataset with real formulas — then auditing it for bias, confounding and ethics. Treat it as a template for the group field study (Sessions 26–29) and a model of what "doing it properly" looks like.
The scenario. A study team at a university wants to know whether heavy smartphone use is associated with worse sustained attention in first-year students, and whether a simple "notifications-off" intervention measurably improves focus. They run a correlational survey phase and a small quasi-experiment, then combine the qualitative and quantitative findings.
Sessions exercised
Each phase below maps to specific course sessions. The tags throughout the page link back to the relevant session and to the matching interactive demo.
1 · Research question, constructs & hypotheses
A good study starts from one sharp question and a handful of falsifiable predictions — not a vague topic. Compare the textbook progression from a "topic" to a testable hypothesis.
From topic to question
- Topic: "phones and attention" — too broad to test.
- Question Q1 (correlational): Among first-year students, is higher daily smartphone screen-time associated with lower sustained-attention performance?
- Question Q2 (causal): Does silencing non-essential notifications for one week improve sustained attention relative to a no-change control?
Constructs → operational definitions
| Construct | Operationalization |
|---|---|
| Screen-time | 7-day mean of OS-reported daily minutes (Screen Time / Digital Wellbeing screenshot) |
| Sustained attention | d2-style cancellation task score: correct − errors over 5 min |
| Self-reported focus | 5-item scale, 5-pt Likert, mean of items |
| Notification load | # of push-enabled apps (count) |
Hypotheses
2 · Design: sampling, instrument & experiment
2.1 Sampling plan
The target population is first-year undergraduates at one university (≈ 1,200 students). A census is infeasible, so we sample. We use stratified random sampling to guarantee the sample mirrors the population on a variable likely tied to the outcome (degree area).
- Frame: the registrar's first-year enrolment list (closest available list to the true population).
- Strata: degree area (STEM / business / humanities), since baseline study habits may differ.
- Within strata: simple random sampling, allocation proportional to stratum size.
- Target size: $n=180$ (see margin-of-error calc below); over-recruit to $n=210$ for ~15% attrition.
2.2 The questionnaire
A short instrument with mixed item types. Note the deliberate use of neutral wording, a reverse-scored item to catch straight-lining, and a behavioral anchor (the OS screenshot) so we are not relying on self-report alone.
| # | Item | Type / scale |
|---|---|---|
| Q1 | Degree area | Categorical (3 options) — stratum check |
| Q2 | Paste your 7-day average daily screen-time (minutes) | Numeric, open — behavioral anchor |
| Q3 | Number of apps with notifications enabled | Numeric, open |
| Q4 | "I can stay focused on one task for 30+ minutes." | 5-pt Likert (1 strongly disagree – 5 strongly agree) |
| Q5 | "I check my phone without consciously deciding to." (reverse-scored) | 5-pt Likert (R) |
| Q6 | "My phone rarely interrupts my studying." | 5-pt Likert |
| Q7 | Average nightly sleep (hours) — covariate | Numeric, open |
| Q8 | "What does 'being focused' feel like for you?" | Open text — qualitative |
2.3 The (quasi-)experimental design
Q2 needs a manipulation. Ideally we randomly assign participants to condition, making it a true experiment; if assignment must respect existing tutorial groups, it becomes a quasi-experiment with intact groups (weaker causal claim). We use a pretest–posttest two-group design.
| Group | Pretest | Manipulation (1 week) | Posttest |
|---|---|---|---|
| Treatment ($n=45$) | O₁ attention task | Silence non-essential notifications | O₂ attention task |
| Control ($n=45$) | O₁ attention task | No change to phone | O₂ attention task |
- Independent variable: notification condition (off vs. unchanged) — manipulated.
- Dependent variable: change in attention score $\Delta = O_2-O_1$.
- Control of confounds: randomization (true experiment) breaks the link between condition and lurking variables; single-blind scoring; identical task instructions; same time-of-day testing.
- Why pretest–posttest: using $\Delta$ removes stable individual differences in baseline attention.
3 · Data collection plan & mock analysis
3.1 Descriptive statistics (correlational wave, n = 180)
Always describe before you infer. Means, standard deviations and the shape of each variable.
| Variable | Mean | SD | Min | Max |
|---|---|---|---|---|
| Screen-time $X$ (min/day) | 312 | 98 | 95 | 588 |
| Attention score $Y$ | 142 | 27 | 71 | 203 |
| Self-report focus (1–5) | 3.1 | 0.8 | 1.2 | 4.8 |
| Sleep (h) | 6.8 | 1.1 | 4.0 | 9.0 |
3.2 Correlation: screen-time vs. attention
# 3.2 — correlation + significance (Python / SciPy) import numpy as np from scipy import stats r, p_two = stats.pearsonr(screen_time, attention) p_one = p_two / 2 # one-sided: predicted negative print(f"r = {r:.2f}, r^2 = {r**2:.2f}, one-sided p = {p_one:.4f}") # r = -0.34, r^2 = 0.12, one-sided p = 0.0000
3.3 Two-group test: did the intervention work?
We compare the attention change $\Delta=O_2-O_1$ between treatment and control with an independent-samples (Welch) $t$-test, and report a standardized effect size.
| Group | n | Mean Δ | SD |
|---|---|---|---|
| Treatment (notifications off) | 45 | +11.2 | 14.0 |
| Control (no change) | 45 | +2.4 | 13.1 |
| Difference | — | 8.8 | — |
# 3.3 — Welch t-test + Cohen's d t, p_two = stats.ttest_ind(delta_T, delta_C, equal_var=False) sp = np.sqrt((delta_T.var(ddof=1) + delta_C.var(ddof=1)) / 2) d = (delta_T.mean() - delta_C.mean()) / sp print(f"t = {t:.2f}, one-sided p = {p_two/2:.4f}, d = {d:.2f}") # t = 3.08, one-sided p = 0.0014, d = 0.65
3.4 Results summary
| Analysis | Test | Statistic | p | Effect size | Decision |
|---|---|---|---|---|---|
| Q1 screen-time ↔ attention | Pearson $r$ | r = −0.34 | < .001 | r² = .12 | reject H₀ |
| Q1 controlling for sleep | partial $r$ | −0.27 | < .001 | r² = .07 | reject H₀ |
| Q2 intervention effect | Welch $t$ | t = 3.08 | .0014 | d = 0.65 | reject H₀ |
4 · Threats to validity, bias & ethics
A finding is only as trustworthy as the design that produced it. Here we audit our own study against the standard threats — the kind of critique the final exam asks you to perform.
Internal validity (causal claim)
- Confounding: sleep, stress, course load. Mitigated by randomization (Q2) and a sleep covariate (Q1).
- Maturation / testing: the attention task itself may improve with practice — the control group accounts for this (their +2.4 baseline drift).
- Demand characteristics: treatment participants may try harder knowing they're "the phone group." Single-blind scoring + a plausible control task reduce this.
External validity (generalization)
- One university, first-years only — results may not transfer to other ages or cultures.
- One-week manipulation says little about durable, months-long effects.
Measurement & bias
- Self-report bias: screen-time recall is unreliable — hence the OS screenshot anchor.
- Acquiescence / social desirability: the reverse-scored Q5 detects straight-lining.
- Selection bias: avoided by random sampling from the enrolment frame, not volunteers.
- Multiple comparisons / p-hacking: we pre-registered exactly three tests; running dozens of subgroup analyses would inflate the family-wise false-positive rate to $1-(1-\alpha)^k$.
Statistical conclusion validity
- Power: $n=180$ gives >80% power to detect $r=-0.25$; the experiment ($n=90$) detects $d\ge 0.6$.
- We report effect sizes ($r^2$, $d$), not just $p$ — significance ≠ importance.
5 · Mapping to learning outcomes
How each part of this project demonstrates a course learning objective.
6 · Extensions & variations
- Add a factor: turn Q2 into a $2\times2$ factorial — notifications (off/on) × study environment (silent/social) — and look for an interaction (Sessions 12–14).
- Go within-subjects: a crossover design where each participant does both conditions in random order increases power and controls for individual differences.
- Deepen the qualitative arm: run focus groups (Sessions 24–25) on Q8 responses and code themes, then triangulate against the quantitative results.
- Interrupted time series: track attention daily across a phone-free week for a single cohort (quasi-experimental, Session 20) to see the trajectory, not just pre/post.
- Robustness: replace Pearson $r$ with Spearman $\rho$ if attention scores are skewed; bootstrap the confidence interval for $d$.
7 · References & further reading
- Privitera, G. J. Research Methods for the Behavioral Sciences. SAGE. — Ch. 5 (Sampling), Ch. 8 (Correlational Designs), Ch. 10 (Between-Subjects), Ch. 11 (Quasi-Experimental), Ch. 13 (Descriptive Statistics).
- American Psychological Association. Publication Manual of the APA (7th ed.). — report structure & statistics reporting.
- Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). — effect-size conventions ($d$, $r$).
- Field, A. Discovering Statistics. SAGE. — Pearson $r$, $t$-tests, partial correlation, Cronbach's $\alpha$.
- Course syllabus — Learning to Observe, Experiment and Survey (PDF).
All numbers on this page are illustrative mock data created to demonstrate the analysis workflow — not results from a real study.