This question evaluates a data scientist's competence in experimental design, metric definition and guardrail selection, power and sample-size calculations, statistical inference (including two-proportion testing and fixed-effects meta-analysis), and debugging inconsistent A/B test reruns through instrumentation, population-shift, and heterogeneity checks. It is commonly asked because interviewers must assess the ability to operationalize randomized email experiments, set run lengths and attribution windows, and diagnose conflicting results using applied analytics; the problem sits in the Analytics & Experimentation domain and tests practical application grounded in conceptual statistical understanding.

An e-commerce company plans to A/B test personalized product emails to improve 7-day purchase conversion. Users will be randomized at the user level (intent-to-treat). Some users may receive multiple emails during the test window.
The initial RCT ran 2025-06-01 to 2025-06-14 with per-arm n = 1,200,000. Control conversion = 3.50%, Treatment = 4.20% (+20.0% relative, +0.70 pp). A rerun on 2025-08-15 to 2025-08-28 with per-arm n = 900,000 observed Control = 3.50%, Treatment = 3.57% (+2.0% relative, +0.07 pp).
Login required