A/B Test Inference, Peeking, and Multiple Comparisons
You run a two-arm A/B test of click-through rate (CTR).
-
Control: n_c = 10,000,000 impressions, CTR_c = 1.20%.
-
Treatment: n_t = 10,000,000 impressions, CTR_t = 1.23%.
Let p_c and p_t be the CTRs (as proportions).
(a) Compute:
-
Absolute lift (p_t − p_c) and relative lift (p_t/p_c − 1).
-
The pooled standard error for the difference in proportions, the z-statistic for H0: p_t = p_c, and the two-sided p-value.
-
A 95% confidence interval for the relative lift.
(b) Suppose you peeked at significance daily over 14 days using the same test threshold each day (no alpha spending). Quantify the approximate inflation of the overall Type I error, and then propose a corrected sequential monitoring plan using either:
-
An alpha-spending/group-sequential design, or
-
A Bayesian sequential alternative with a prior on lift.
(c) You also track 8 guardrail metrics. Explain how you would control familywise error rate (FWER) or false discovery rate (FDR) across these guardrails (and how this interacts with sequential monitoring).