A/B Test Interpretation, Launch Decision, Segmentation, and Multi-Experiment Error Control
Context
You ran two A/B tests on an e-commerce platform:
-
T1 and T2 are feature variants intended to impact two business metrics:
-
Gross Bookings (GB): pre-fee, pre-incentive order value (a growth metric).
-
VC: Variable Contribution (margin) per order (i.e., contribution margin after variable costs). Assumption: A decrease in VC is margin-negative. If your org defines VC differently (e.g., as a contra-revenue where a decrease is good), flip the sign logic accordingly.
Observed Results
-
T1: No statistically significant change in GB or VC.
-
T2: Statistically significant increase in GB but statistically significant decrease in VC.
-
T2 confidence intervals:
-
GB: [+0.1%, +2.3%] ≈ +$0.48 per order
-
VC: [–2.5%, –1.5%] ≈ –$0.20 per order
Tasks
-
Explain these results to the PM (statistical vs practical significance; growth vs margin trade-offs; plausible mechanisms).
-
Decide whether to launch T2 using the given CIs and per-order impacts, and justify the decision.
-
Design a segmentation analysis to identify cohorts where GB lifts without hurting VC.
-
If you will run 20 parallel feature experiments, define:
-
Launch criteria and statistical thresholds for the primary and guardrail metrics.
-
How you will control false discoveries and error rates across the portfolio.
Hints
-
Contrast statistical vs practical significance.
-
Weigh revenue (GB) vs margin (VC) trade-offs.
-
Apply multiple-testing corrections where appropriate.
-
Use principled cohort discovery techniques that avoid p-hacking.