A/B Test Diagnostics: Did Traffic Distribution Cause the Retention Drop?
Context
An A/B test changed a button's color from green (control) to red (treatment). The primary metric (e.g., Day-7 user retention) decreased in the treatment. Stakeholders suspect the retention drop could be due to traffic allocation issues rather than the color itself.
Assume:
-
User-level randomization with a nominal 50/50 split.
-
Retention is measured on enrolled users with sufficient maturation time (e.g., D7 retention on cohorts enrolled ≥7 days ago).
-
Sufficient sample size for standard asymptotic tests.
Question
Outline a step-by-step plan to investigate whether traffic distribution problems caused the retention decrease. What diagnostics, balance checks, and statistical tests would you run before concluding the new color harms retention? Discuss:
-
Randomization sanity checks and sample ratio mismatch (SRM).
-
Covariate balance and eligibility/exposure balance.
-
Sequential testing/peeking bias and timing effects.
-
Segment-level retention comparisons and heterogeneity.
Provide concrete tests, decision thresholds, and how you would interpret outcomes.