Using the Biker experiment context, compute required sample sizes and describe error control under practical constraints. Show formulas and numeric answers where possible.
Assumptions:
-
Two-armed test (control vs Biker exposure).
-
Two-sided alpha unless stated otherwise.
-
Primary mean metric: Baseline mean delivery time = 42 min, SD = 15 min. Target relative improvement = −5%. Alpha = 0.05 (two-sided), power = 0.80. Compute per-arm sample size for a two-sample t-test on means.
-
Guardrail proportion metric: Baseline cancellation rate = 6%. You require non-inferiority with margin +0.5 percentage points (i.e., Biker cancel rate ≤ 6.5%). One-sided alpha = 0.05, power = 0.80. Compute per-arm sample size for a non-inferiority test on proportions.
-
Multiple metrics: You have 1 primary, 2 guardrails (cancellations, ETA accuracy), and 1 secondary (orders per courier-hour). Propose and justify an error-control approach (e.g., Bonferroni, Holm/Hochberg, gatekeeping/HMP). State the effective alpha for each family and how you’d report adjusted CIs.
-
Cluster randomization: You randomize at the zone-day level with average m = 300 orders per cluster and ICC = 0.03 for the primary metric. Compute the design effect DE and the adjusted per-arm sample size. How many zone-days per arm are needed to reach that sample size?
-
Sequential monitoring: You plan 4 equally spaced looks with O’Brien–Fleming spending. Explain qualitatively how early critical values differ from the final look and how this affects runtime/MDE. Provide the final-look alpha spending approximation and how you’d implement boundary checks in practice.