A/B Test on Conversion: Powering, Inference, CUPED, Multiple Testing, and Clustering
You are running a two-arm A/B experiment on a binary conversion outcome.
Assume a baseline conversion p0 = 0.120 and a target absolute uplift of Δ = +0.005. Use a two-sided test with α = 0.05 and power = 0.80 unless stated otherwise.
-
Sample size planning
-
Using the normal approximation for a difference in proportions, derive and compute the required per-variant sample size. Show the formula and numeric calculation.
-
Post-experiment inference
-
After one week you observe: Control nC = 150,000, xC = 18,000; Treatment nT = 150,000, xT = 18,900.
-
Compute the point estimate
Δ^
, a 95% confidence interval for the difference, and a two-sided p-value. Interpret statistical significance vs business significance.
-
CUPED variance reduction
-
Using a pre-period covariate with CUPED that yields R² = 0.30 on the outcome, estimate the new effective sample size (or variance) and the revised MDE for the same design. Show your math and state assumptions.
-
Multiple testing with guardrails
-
You track 4 guardrail metrics with unadjusted p-values {0.03, 0.01, 0.20, 0.04}.
-
Apply the Holm–Bonferroni method at familywise α = 0.05. Show the ordering, adjusted thresholds for each step, and which guardrails remain significant.
-
Clustering by geography
-
Explain how you would check for and correct overdispersion or miscalibration in conversion estimates when there is user clustering by geo.