A/B Test Interpretation, Launch Decision, Segmentation, and Multiple-Testing Control
Context
You ran an experiment with two treatments (t1, t2) against a control. Two core business metrics were tracked:
-
Gross Booking (GB): a volume/topline metric (e.g., GMV, loan originations, order value).
-
Variable Consideration (VC): a monetization metric (e.g., revenue/take-rate/fees tied to transactions).
Observed results:
-
t1: No statistically significant change in GB or VC.
-
t2: Statistically significant increase in GB and statistically significant decrease in VC.
Confidence intervals for t2 (vs. control):
-
GB: +0.1% to +2.3%; point-estimated lift +$0.48.
-
VC: −2.5% to −1.5%; point-estimated loss −$0.20.
Questions
-
How would you explain these results to the PM and recommend next steps?
-
Given the confidence intervals above, would you launch t2? Justify your decision in terms of business objectives and risk.
-
How would you segment users or orders to identify cohorts with positive GB impact and no negative VC impact?
-
If 20 different experiments run simultaneously, how would you define portfolio-level launch criteria to control false discoveries and ensure reliable decisions?
Hint: Discuss trade-offs, statistical power, cost–benefit, cohort analysis, and multiple-testing corrections (e.g., FDR).