Estimate Causal Impact Using Synthetic Control Methods
A product feature has already launched to 100% of traffic, and no explicit control or holdout group exists. You need to estimate causal impact on a business metric in a marketplace with seasonality and segment heterogeneity.
Constraints & Assumptions
-
Historical pre- and post-launch data are available.
-
Unit-level or geo-level panels may be available.
-
Exposure may vary even after a 100% rollout.
-
The analysis must build a credible counterfactual.
Clarifying Questions to Ask
-
Was rollout truly simultaneous, or did exposure vary by platform, market, app version, or eligibility?
-
What is the primary outcome and post-period horizon?
-
Are there major concurrent launches, marketing campaigns, or external shocks?
-
What historical units could serve as donors for a synthetic control?
Part 1 - Methods
What causal inference methodologies are suitable for this setting?
What This Part Should Cover
-
Interrupted time series, Bayesian structural time series, synthetic control, matched controls, difference-in-differences using exposure variation, and regression adjustment.
-
When each method is appropriate.
Part 2 - Required Data
What data does each method require?
What This Part Should Cover
-
Pre/post outcome history, untreated or less-exposed donor units, covariates, exposure logs, seasonality variables, and market or segment panels.
-
Data quality and stable metric definitions.
Part 3 - Assumptions and Validation
What identification assumptions must hold, and how would you validate them?
What This Part Should Cover
-
Good pre-period fit, no unmeasured concurrent shocks, stable relationship between treated and donor units, no spillovers, and comparable trends.
-
Placebo tests, backtesting, pre-trend checks, donor sensitivity, and falsification outcomes.
Part 4 - Uncertainty and Communication
How would you quantify and communicate uncertainty?
What This Part Should Cover
-
Confidence or credible intervals, placebo distributions, sensitivity analysis, scenario ranges, and limitations.
-
Clear recommendation despite uncertainty.
What a Strong Answer Covers
A strong answer does not pretend a 100% rollout is an experiment; it builds the best possible counterfactual, validates assumptions, and communicates uncertainty and caveats honestly.
Follow-up Questions
-
What if synthetic control has poor pre-period fit?
-
How would you handle seasonality and holidays?
-
How would you use exposure intensity after a 100% rollout?