A/B Test: Power, Clustering, Sequential Monitoring, Multiple Comparisons, and Diagnostics
Context
You are planning an A/B test for a dashboard change intended to increase order conversion. The baseline daily unique-visitor conversion rate is 6.0%. You want 80% power at a two-sided α = 0.05 to detect a 5% relative lift (i.e., from 6.0% to 6.3%). Visitors are independent Bernoulli trials in the basic setup, but can be clustered by traffic source in an alternative scenario.
Tasks
-
Compute the minimum per-variant sample size assuming independent Bernoulli trials. State the formula and plug in the numbers.
-
Now assume users are clustered by traffic source with intra-class correlation (ICC) of 0.02 and an average of 50 users per source. Recompute the required per-variant sample size using the design effect, and state the effective sample size relationship.
-
You must monitor daily for 14 days. Outline a valid sequential testing plan (e.g., alpha spending or group-sequential boundaries) and explain how to form sequentially adjusted confidence intervals.
-
You also track 8 secondary metrics. Specify a multiple-comparisons control strategy and why it’s appropriate.
-
After launch you observe an overall +0.4 pp lift but a −0.6 pp lift for mobile Safari. Explain three diagnostic checks (e.g., bot traffic, implementation bugs, covariate imbalance) to reconcile discrepancies and how you would quantify each.