Experiment Design and Causal Inference: Multi-part Problem
Context: You are designing a high-traffic web A/B test on a binary conversion metric. Answer each part with formulas, numeric results, and clearly stated assumptions.
A) Sample size
-
Baseline conversion p0 = 0.045
-
Target MDE = +7% relative, so p1 = 0.045 × 1.07
-
Two-sided alpha = 0.05, power = 0.90
-
Compute the per-variant sample size for a standard two-proportion z-test using the pooled variance planning assumption. Show the z-scores used and the variance terms.
B) Duration
-
Daily visitors = 1.2M
-
Traffic split = 60/40 (A/B)
-
Eligibility = 80%
-
Using the sample size from (A), compute calendar days needed. State any adjustments for repeat visitors and overlap with other experiments.
C) Variance reduction (CUPED)
-
A pre-experiment covariate has R^2 = 0.20 with the outcome.
-
Quantify the effective MDE reduction (or equivalently, sample-size reduction) with CUPED. Explain when CUPED can increase bias (e.g., covariate shift).
D) Sequential testing
-
You plan daily peeks for 21 days.
-
Propose an alpha-spending or group-sequential design (e.g., Pocock or O’Brien–Fleming). Specify the spending function and the final critical z. Briefly compare to always-valid sequential methods (SPRT/e-values).
E) Interference and clustering
-
Cross-unit spillovers exist when randomizing by user.
-
Propose a clustered design (e.g., geo or traffic-bucket). Compute the design effect for ICC = 0.02 with average cluster size m = 5 and m = 50, and show how it changes the sample size.
F) SRM check
-
Day 3 observed: A = 110,000 users, B = 90,000 users.
-
Expected from eligible 200,000 with 60/40 split: A = 120,000, B = 80,000.
-
Perform a chi-square goodness-of-fit test and report the p-value. What actions do you take if SRM is significant?
G) Causal inference (observational)
-
The team previously ran an observational study with a strong pre-period trend.
-
Sketch a DAG, choose an identification strategy (DID, IV, or RDD), list required assumptions, and propose concrete robustness checks (placebo tests, pre-trend tests, sensitivity analyses).