Compute sample size and plan experiment
Company: Disney
Role: Data Scientist
Category: Statistics & Math
Difficulty: hard
Interview Round: HR Screen
A product team wants to A/B test a paywall copy change targeting new signups to improve the next-day subscription start rate.
Given:
- Baseline next-day subscription start rate among new signups: 18%.
- Minimum detectable effect: +7% relative (i.e., lift to 19.26%).
- Two-sided α=0.05, power=0.80, equal allocation.
- Optionally apply CUPED with R^2=0.25 using pre-experiment engagement.
- If randomizing by household, the average household size m=1.8 (signups per household) and ICC=0.06.
- Plan up to 4 interim looks with O'Brien–Fleming spending.
- There are 12 secondary metrics; control FDR at 10% using Benjamini–Hochberg.
- 10% of users assigned to treatment will not actually see the new copy (noncompliance), and 3% of control users may be exposed due to caching.
Questions:
1) Compute the per-arm sample size ignoring variance reduction and clustering. Show formulas and approximations used.
2) With CUPED (R^2=0.25), what is the effective sample size reduction? Recompute the required per-arm sample.
3) Adjust for household clustering via the design effect DE=1+(m−1)×ICC. Recompute the per-arm sample size under clustering (with and without CUPED).
4) Describe how O'Brien–Fleming boundaries alter Type I error allocation and the practical implications for timeline/power.
5) State how you would control FDR at 10% across 12 secondary metrics and interpret discoveries.
6) Compute the ITT vs CACE effect given the noncompliance rates (assume monotonicity). How would you report both responsibly to product stakeholders?
Quick Answer: This question evaluates a candidate's proficiency in experimental design and applied statistics, including sample size and power calculations, variance reduction (CUPED), clustering and design-effect adjustments, interim analysis with O'Brien–Fleming alpha spending, multiple-testing control (Benjamini–Hochberg), and causal estimands such as ITT versus CACE. It is commonly asked because interviewers need assurance that a practitioner can translate business treatment goals into a rigorous experiment plan that balances Type I/II error, multiplicity, noncompliance and operational constraints; this falls under the Statistics & Math domain and emphasizes practical application grounded in conceptual understanding.