research-methods-lab worked example project · a full study, end to end

Worked example: screen-time & focus — a mixed-methods study, start to finish

This page is a single, fully worked research project that threads together every method in the course: framing a testable question, operationalizing fuzzy constructs, drawing a representative sample, writing a clean questionnaire, designing a quasi-experiment, and analyzing a mock dataset with real formulas — then auditing it for bias, confounding and ethics. Treat it as a template for the group field study (Sessions 26–29) and a model of what "doing it properly" looks like.

The scenario. A study team at a university wants to know whether heavy smartphone use is associated with worse sustained attention in first-year students, and whether a simple "notifications-off" intervention measurably improves focus. They run a correlational survey phase and a small quasi-experiment, then combine the qualitative and quantitative findings.

Goal
Link daily screen-time to sustained attention & test an intervention
Type
Mixed-methods: survey + quasi-experiment
Target sample
n ≈ 180 first-year students
Difficulty
intermediate
Est. effort
~12–15 h (design + fieldwork + write-up)
Deliverable
APA-style report + conference-style talk

Sessions exercised

Each phase below maps to specific course sessions. The tags throughout the page link back to the relevant session and to the matching interactive demo.

1 · Research question, constructs & hypotheses

A good study starts from one sharp question and a handful of falsifiable predictions — not a vague topic. Compare the textbook progression from a "topic" to a testable hypothesis.

From topic to question

  • Topic: "phones and attention" — too broad to test.
  • Question Q1 (correlational): Among first-year students, is higher daily smartphone screen-time associated with lower sustained-attention performance?
  • Question Q2 (causal): Does silencing non-essential notifications for one week improve sustained attention relative to a no-change control?

Constructs → operational definitions

ConstructOperationalization
Screen-time7-day mean of OS-reported daily minutes (Screen Time / Digital Wellbeing screenshot)
Sustained attentiond2-style cancellation task score: correct − errors over 5 min
Self-reported focus5-item scale, 5-pt Likert, mean of items
Notification load# of push-enabled apps (count)

Hypotheses

Correlational (Q1). Let $X$ = mean daily screen-time (min) and $Y$ = attention-task score. $$H_0:\ \rho_{XY}=0 \qquad H_1:\ \rho_{XY}<0$$ A one-sided alternative: more screen-time is predicted to go with lower attention.
Experimental (Q2). Let $\mu_T,\mu_C$ be mean post-intervention attention for treatment (notifications off) and control. $$H_0:\ \mu_T-\mu_C = 0 \qquad H_1:\ \mu_T-\mu_C > 0$$
Directionality trap. Even a strong negative $\rho_{XY}$ in Q1 cannot establish that phones cause inattention — reverse causation (inattentive people reach for phones) and third variables (sleep, stress) are live. That is precisely why Q2 adds an experiment.

2 · Design: sampling, instrument & experiment

2.1 Sampling plan

The target population is first-year undergraduates at one university (≈ 1,200 students). A census is infeasible, so we sample. We use stratified random sampling to guarantee the sample mirrors the population on a variable likely tied to the outcome (degree area).

How big must the sample be? For a proportion estimate at 95% confidence with margin of error $E$, the worst-case ($\hat p=0.5$) size is $$n=\frac{z^2\,\hat p(1-\hat p)}{E^2}=\frac{1.96^2\cdot 0.25}{0.05^2}\approx 384.$$ With a finite population of $N=1200$ we apply the finite-population correction: $$n_{\text{adj}}=\frac{n}{1+(n-1)/N}=\frac{384}{1+383/1200}\approx 291.$$ For our continuous attention outcome we instead size for the correlation/experiment (power analysis below), landing on $n\approx 180$ — comfortably above the minimum to detect $r=-0.25$.
Why not just post a link? A volunteer/convenience web sample over-represents the highly-engaged and self-selects on the very trait we study (phone habits) — classic selection bias. Stratified random sampling from the enrolment frame is the defensible choice.

2.2 The questionnaire

A short instrument with mixed item types. Note the deliberate use of neutral wording, a reverse-scored item to catch straight-lining, and a behavioral anchor (the OS screenshot) so we are not relying on self-report alone.

#ItemType / scale
Q1Degree areaCategorical (3 options) — stratum check
Q2Paste your 7-day average daily screen-time (minutes)Numeric, open — behavioral anchor
Q3Number of apps with notifications enabledNumeric, open
Q4"I can stay focused on one task for 30+ minutes."5-pt Likert (1 strongly disagree – 5 strongly agree)
Q5"I check my phone without consciously deciding to." (reverse-scored)5-pt Likert (R)
Q6"My phone rarely interrupts my studying."5-pt Likert
Q7Average nightly sleep (hours) — covariateNumeric, open
Q8"What does 'being focused' feel like for you?"Open text — qualitative
Scale reliability. Items Q4–Q6 form the self-reported-focus scale. After reverse-scoring Q5, we report internal consistency with Cronbach's $\alpha$; we treat $\alpha\ge 0.70$ as acceptable. Reliability (consistency) is necessary but not sufficient for validity (measuring the right thing).

2.3 The (quasi-)experimental design

Q2 needs a manipulation. Ideally we randomly assign participants to condition, making it a true experiment; if assignment must respect existing tutorial groups, it becomes a quasi-experiment with intact groups (weaker causal claim). We use a pretest–posttest two-group design.

GroupPretestManipulation (1 week)Posttest
Treatment ($n=45$)O₁ attention taskSilence non-essential notificationsO₂ attention task
Control ($n=45$)O₁ attention taskNo change to phoneO₂ attention task

3 · Data collection plan & mock analysis

1Pilot the questionnaire on ~10 students; check item clarity, timing, and that the OS screenshot instruction works on both iOS and Android.
2Recruit & consent. Email the stratified random sample; obtain informed consent; assign anonymous IDs (no names stored with data).
3Survey wave. Collect Q1–Q8 + the pretest attention task (O₁) for all participants — this powers the correlational analysis.
4Assign & intervene. Randomize the experimental subset to treatment/control; run the 1-week manipulation.
5Posttest. Re-administer the attention task (O₂).
6Clean & analyze. Reverse-score Q5, screen for straight-lining and impossible values, then run the analyses below.

3.1 Descriptive statistics (correlational wave, n = 180)

Always describe before you infer. Means, standard deviations and the shape of each variable.

VariableMeanSDMinMax
Screen-time $X$ (min/day)3129895588
Attention score $Y$1422771203
Self-report focus (1–5)3.10.81.24.8
Sleep (h)6.81.14.09.0
Sample mean and (unbiased) standard deviation: $$\bar x=\frac{1}{n}\sum_{i=1}^{n}x_i,\qquad s=\sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_i-\bar x)^2}.$$

3.2 Correlation: screen-time vs. attention

Pearson's correlation coefficient: $$r=\frac{\sum (x_i-\bar x)(y_i-\bar y)}{\sqrt{\sum (x_i-\bar x)^2}\,\sqrt{\sum (y_i-\bar y)^2}}.$$ Mock result: $r=-0.34$, so $r^2=0.12$ — screen-time accounts for ~12% of the variance in attention scores. We test it against $H_0:\rho=0$ with $$t=\frac{r\sqrt{n-2}}{\sqrt{1-r^2}}=\frac{-0.34\sqrt{178}}{\sqrt{1-0.116}}\approx -4.83,$$ on $df=178$, giving $p<0.001$ (one-sided). We reject $H_0$: a moderate negative association.
# 3.2 — correlation + significance (Python / SciPy)
import numpy as np
from scipy import stats

r, p_two = stats.pearsonr(screen_time, attention)
p_one    = p_two / 2            # one-sided: predicted negative
print(f"r = {r:.2f},  r^2 = {r**2:.2f},  one-sided p = {p_one:.4f}")
# r = -0.34,  r^2 = 0.12,  one-sided p = 0.0000
Correlation ≠ causation. The $r=-0.34$ is consistent with phones harming attention, attention problems driving phone use, or a confounder (poor sleep) driving both. We added sleep as a covariate (Q7); a partial correlation controlling for sleep drops to $r_{XY\cdot Z}=-0.27$ — attenuated but still present.

3.3 Two-group test: did the intervention work?

We compare the attention change $\Delta=O_2-O_1$ between treatment and control with an independent-samples (Welch) $t$-test, and report a standardized effect size.

GroupnMean ΔSD
Treatment (notifications off)45+11.214.0
Control (no change)45+2.413.1
Difference8.8
Welch's two-sample $t$ and pooled-SD Cohen's $d$: $$t=\frac{\bar\Delta_T-\bar\Delta_C}{\sqrt{\dfrac{s_T^2}{n_T}+\dfrac{s_C^2}{n_C}}} =\frac{11.2-2.4}{\sqrt{\frac{14.0^2}{45}+\frac{13.1^2}{45}}}\approx 3.08,$$ $$d=\frac{\bar\Delta_T-\bar\Delta_C}{s_{\text{pooled}}}=\frac{8.8}{13.6}\approx 0.65.$$ With $df\approx 87.6$, $p\approx 0.0014$ (one-sided). We reject $H_0$: a moderate, statistically significant improvement (Cohen's $d\approx 0.65$).
# 3.3 — Welch t-test + Cohen's d
t, p_two = stats.ttest_ind(delta_T, delta_C, equal_var=False)
sp = np.sqrt((delta_T.var(ddof=1) + delta_C.var(ddof=1)) / 2)
d  = (delta_T.mean() - delta_C.mean()) / sp
print(f"t = {t:.2f},  one-sided p = {p_two/2:.4f},  d = {d:.2f}")
# t = 3.08,  one-sided p = 0.0014,  d = 0.65

3.4 Results summary

AnalysisTestStatisticpEffect sizeDecision
Q1 screen-time ↔ attentionPearson $r$r = −0.34< .001r² = .12reject H₀
Q1 controlling for sleeppartial $r$−0.27< .001r² = .07reject H₀
Q2 intervention effectWelch $t$t = 3.08.0014d = 0.65reject H₀
Plain-language conclusion. Heavier screen-time is moderately associated with lower sustained attention (and remains so after adjusting for sleep), and a one-week notifications-off intervention produced a moderate, significant gain in attention. The combined design lets us speak more confidently about cause than the survey alone could.

4 · Threats to validity, bias & ethics

A finding is only as trustworthy as the design that produced it. Here we audit our own study against the standard threats — the kind of critique the final exam asks you to perform.

Internal validity (causal claim)

  • Confounding: sleep, stress, course load. Mitigated by randomization (Q2) and a sleep covariate (Q1).
  • Maturation / testing: the attention task itself may improve with practice — the control group accounts for this (their +2.4 baseline drift).
  • Demand characteristics: treatment participants may try harder knowing they're "the phone group." Single-blind scoring + a plausible control task reduce this.

External validity (generalization)

  • One university, first-years only — results may not transfer to other ages or cultures.
  • One-week manipulation says little about durable, months-long effects.

Measurement & bias

  • Self-report bias: screen-time recall is unreliable — hence the OS screenshot anchor.
  • Acquiescence / social desirability: the reverse-scored Q5 detects straight-lining.
  • Selection bias: avoided by random sampling from the enrolment frame, not volunteers.
  • Multiple comparisons / p-hacking: we pre-registered exactly three tests; running dozens of subgroup analyses would inflate the family-wise false-positive rate to $1-(1-\alpha)^k$.

Statistical conclusion validity

  • Power: $n=180$ gives >80% power to detect $r=-0.25$; the experiment ($n=90$) detects $d\ge 0.6$.
  • We report effect sizes ($r^2$, $d$), not just $p$ — significance ≠ importance.
Ethics (IE / APA principles, Session 22). Informed consent and the right to withdraw; anonymous IDs with screen-time data stored separately from identifiers; minimal risk (no deception); a debrief explaining the hypotheses; and a data-retention plan. The notifications-off manipulation is low-risk and reversible. Approval from the research ethics board precedes any data collection.

5 · Mapping to learning outcomes

How each part of this project demonstrates a course learning objective.

LO1 · Think critically about research
Framed a testable question, separated correlation (Q1) from causation (Q2), and audited our own confounds.
LO2 · Evaluate quality (reliability, validity, triangulation)
Reported Cronbach's $\alpha$, distinguished reliability from validity, triangulated self-report with a behavioral task and a qualitative item.
LO3 · Communicate research clearly
Structured the work as an APA-style report with a results table, effect sizes, and a plain-language conclusion ready for a conference-style talk.
Methods breadth
Exercised sampling, survey design, correlation, true/quasi-experiment, and descriptive + inferential statistics in one coherent study.

6 · Extensions & variations

7 · References & further reading

  1. Privitera, G. J. Research Methods for the Behavioral Sciences. SAGE. — Ch. 5 (Sampling), Ch. 8 (Correlational Designs), Ch. 10 (Between-Subjects), Ch. 11 (Quasi-Experimental), Ch. 13 (Descriptive Statistics).
  2. American Psychological Association. Publication Manual of the APA (7th ed.). — report structure & statistics reporting.
  3. Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). — effect-size conventions ($d$, $r$).
  4. Field, A. Discovering Statistics. SAGE. — Pearson $r$, $t$-tests, partial correlation, Cronbach's $\alpha$.
  5. Course syllabus — Learning to Observe, Experiment and Survey (PDF).

All numbers on this page are illustrative mock data created to demonstrate the analysis workflow — not results from a real study.