This question evaluates a data scientist's proficiency in experiment diagnostics and causal inference, testing understanding of experimentation infrastructure, retention metrics, data quality, randomization integrity and identification of causal effects in observational settings, and is commonly asked to assess the ability to distinguish true treatment impacts from allocation or instrumentation issues. It falls under Analytics & Experimentation and causal inference within data science, requiring both conceptual understanding of bias and identification and practical application of experiment-quality diagnostics and identification strategies using user-level, time-stamped event logs.

A consumer app ran an A/B test that changed a call-to-action (CTA) button from green (control) to red (treatment). Retention decreased in treatment.
You need to:
Assume retention is a k-day retention metric (e.g., 7-day retention), and the platform has standard experimentation infrastructure with event logs, feature flags, and user-level randomization. For the reviews question, assume we have time-stamped purchases and reviews at user–merchant level.
Describe the diagnostics you would run to determine if uneven traffic allocation or other experiment-quality issues drove the result, and how you would decide whether to trust the result as causal.
Outline the data you need and a methodology to obtain an unbiased estimate of the impact of receiving negative reviews on a merchant’s coupon repurchase rate.
Login required