Post-hoc Causal Estimation After a Failed A/B Rollout
Context
An intern accidentally shipped a feature to 100% of eligible users for 5 consecutive days (the T period). There is no concurrent control. You have 4 full weeks of stable pre-period data (the P period) collected under identical eligibility rules and product configuration.
-
Primary metric: 1-day retention (D+1 retention).
-
Guardrails: crashes per session, latency p95, purchase conversion.
Your goal is to recover the causal treatment effect using observational methods and to describe validation, assumptions, uncertainty, and prevention.
Tasks
-
Propose and compare at least two identification strategies to estimate the treatment effect using observational methods:
-
(a) Pre–post with CUPED.
-
(b) Synthetic control via matching/propensity-score weighting (PSW) against ineligible-but-similar users or delayed-exposure users. For (b), specify covariates, overlap checks, and diagnostics (SMD, eCDF, weight trimming).
-
(c) Difference-in-differences (DiD) using a holdout geography.
-
For each method, state the assumptions (e.g., parallel trends, no interference, ignorability) and design falsification/placebo tests to probe them.
-
Explain how to compute ATT vs ATE, handle calendar effects and novelty/seasonality, and quantify uncertainty (cluster-robust SEs or bootstrap under weighting).
-
List pitfalls in the original A/B setup that led to this failure and propose a prevention plan (exposure checks, invariant metrics, automated power and allocation validation).