A/B Test Snapshot: Pickup ETA Card Experiment
You are analyzing a 7-day A/B test with equal allocation. Each request is an exposure; the primary outcome is completion per request. Two guardrails monitor safety/experience. Assume independent observations and large-sample approximations are acceptable.
Data (7-day snapshot):
-
Primary metric (trip completion rate per request):
-
Control A: nA = 50,000 requests, cA = 6,000 completions
-
Treatment B: nB = 50,000 requests, cB = 6,420 completions
-
Guardrail 1 (rider cancel rate per request):
-
Control A: cancelsA = 4,500
-
Treatment B: cancelsB = 4,950
-
Guardrail 2 (wait time, minutes per request):
-
A: meanA = 4.8, sdA = 3.2, nA = 50,000
-
B: meanB = 4.7, sdB = 3.4, nB = 50,000
-
There were 5 interim looks at equally spaced information times with no pre-registered alpha spending.
Tasks:
-
State precise H0 and H1 for the primary metric; specify one- vs. two-sided and justify.
-
Choose the appropriate test for the primary metric (difference in proportions) and compute: test statistic, p-value, and a 95% CI for the lift. Show formulas and numeric results.
-
For Guardrail 2 (mean wait time), select the correct test (e.g., Welch’s t-test) and compute the 95% CI of the mean difference. State any distributional assumptions and why Welch vs. pooled.
-
Perform a multiple-testing correction across the three outcomes (Primary, Guardrail 1, Guardrail 2) using Holm–Bonferroni at familywise α = 0.05. Identify which effects remain significant.
-
Explain, in plain language, what the p-value you computed in (2) does and does not mean.
-
Given the unplanned 5 interim looks, re-evaluate significance using a simple Pocock or O’Brien–Fleming alpha-spending approach (outline the approach and provide an approximate adjusted conclusion; exact boundaries not required but justify your decision).
-
If pre-period completion rate per rider has correlation r = 0.40 with the in-experiment outcome, estimate the approximate variance reduction from CUPED and discuss how that would change required sample size or interpretation.
-
Conclude: ship, iterate, or stop? Defend your decision considering the guardrails.