Compute sample sizes and error control
Company: SIG (Susquehanna)
Role: Data Scientist
Category: Statistics & Math
Difficulty: Medium
Interview Round: Technical Screen
Using the Biker experiment context, compute required sample sizes and describe error control under practical constraints. Show formulas and numeric answers where possible.
Assumptions:
- Two-armed test (control vs Biker exposure).
- Two-sided alpha unless stated otherwise.
1) Primary mean metric: Baseline mean delivery time = 42 min, SD = 15 min. Target relative improvement = −5%. Alpha = 0.05 (two-sided), power = 0.80. Compute per-arm sample size for a two-sample t-test on means.
2) Guardrail proportion metric: Baseline cancellation rate = 6%. You require non-inferiority with margin +0.5 percentage points (i.e., Biker cancel rate ≤ 6.5%). One-sided alpha = 0.05, power = 0.80. Compute per-arm sample size for a non-inferiority test on proportions.
3) Multiple metrics: You have 1 primary, 2 guardrails (cancellations, ETA accuracy), and 1 secondary (orders per courier-hour). Propose and justify an error-control approach (e.g., Bonferroni, Holm/Hochberg, gatekeeping/HMP). State the effective alpha for each family and how you’d report adjusted CIs.
4) Cluster randomization: You randomize at the zone-day level with average m = 300 orders per cluster and ICC = 0.03 for the primary metric. Compute the design effect DE and the adjusted per-arm sample size. How many zone-days per arm are needed to reach that sample size?
5) Sequential monitoring: You plan 4 equally spaced looks with O’Brien–Fleming spending. Explain qualitatively how early critical values differ from the final look and how this affects runtime/MDE. Provide the final-look alpha spending approximation and how you’d implement boundary checks in practice.
Quick Answer: This question evaluates proficiency in experimental design and applied inferential statistics—specifically sample size calculation for means and proportions, non-inferiority testing, multiple-comparison error control, cluster-randomized design effects, and sequential monitoring boundaries—within the Statistics & Math domain for a Data Scientist role. It is commonly asked to measure the ability to apply statistical formulas and error-control principles under practical constraints, balancing power and type I error while accounting for clustering and interim looks; the assessment emphasizes practical application grounded in conceptual understanding of inferential methods.