Stratified A/B Test Across Two Strata (Week/Location)
You ran an email A/B test across two strata defined by week/location. Each user receives at most one email and either responds (success) or not (failure).
Strata and counts:
-
Stratum 1 — Week 1 (Los Angeles):
-
A: 100,000 sent, 10,000 responses
-
B: 10,000 sent, 1,500 responses
-
Stratum 2 — Week 2 (New York):
-
A: 10,000 sent, 400 responses
-
B: 100,000 sent, 6,000 responses
Tasks:
-
Compute stratum-specific conversion rates (A and B) and 95% confidence intervals.
-
Using a Mantel–Haenszel (MH) approach, estimate the common treatment effect across strata: report the common odds ratio (OR) with 95% CI and a two-sided p-value (alpha = 0.05).
-
Test for effect heterogeneity across strata (e.g., Breslow–Day or an equivalent interaction test) and interpret.
-
Compute the naive pooled difference in conversion (ignoring stratification) and state whether Simpson’s paradox occurs, and why.
-
Recommend A or B, reconciling the stratified and naive results.
Assume standard A/B test conditions unless stated otherwise (e.g., independent Bernoulli outcomes, single exposure per user, no interference, and randomization within each stratum).