A/B Test: Sample Size, Sequential Correction, and Post-Experiment Analysis
Context
You are planning a two-arm A/B test with a binary (Bernoulli) conversion outcome and equal allocation. The baseline conversion rate is 5%. You want 90% power at a two-sided α = 0.05 to detect a 6% relative lift. Use a normal approximation for sample size.
Assumptions (made explicit for clarity):
-
"6% relative lift" means p1 = 1.06 × p0.
-
Allocation is 50/50; outcome is per-user Bernoulli within the measurement window.
-
10% of traffic is bots and will be excluded (or is non-informative), so gross traffic must be inflated.
-
The test runs for 7 days; assume independent daily increments to translate total sample to an approximate per-day requirement and to define the information fraction for a single interim look at day 3 (t = 3/7).
Tasks
-
Compute per-variant sample size using a two-proportion z-test normal approximation.
-
Adjust the required gross traffic for 10% bot share; translate to a 7-day window assuming independence by day (i.e., per-day need).
-
After running, you observe:
-
Variant A: nA = 50,000 users; xA = 2,650 conversions.
-
Variant B: nB = 49,500 users; xB = 2,820 conversions.
There was one interim look at day 3.
-
Compute the p-value and a 95% CI for the difference in proportions.
-
Correct for the interim look using an O’Brien–Fleming spending function (t1 = 3/7) or Bonferroni.
-
Check for a sample-ratio mismatch (SRM).
-
Conclude whether to ship, discussing Type S/M risks and baseline mis-specification.