Synthetic Control: Assumptions, Estimation, Inference, and Diagnostics
Context
You are estimating the causal effect of an intervention on a single treated unit using time-series data and a donor pool of untreated units. Synthetic Control (SCM) constructs a weighted combination of donors to approximate the treated unit’s counterfactual path in the absence of treatment. Answer the following, focusing on identification, estimation choices, inference, and practical diagnostics.
Task
-
Identification assumptions and how violations bias estimates
-
Convex-hull or linear-span requirement (overlap).
-
No interference/spillovers between treated and donor units.
-
Stability of relationships over time (factor-loadings/predictor relationships).
-
Reliance on strong pre-period fit to proxy unobserved confounders.
-
Variable and lag selection without post-treatment leakage; how to regularize with high-dimensional predictors.
-
Inference strategy
-
Construct pointwise and cumulative treatment effects.
-
Use in-space and in-time placebos (MSPE ratio) to obtain p-values.
-
Build uncertainty bands and explain why naive bootstrapping can fail.
-
Diagnostics and fixes when assumptions are strained
-
Poor pre-period fit, structural breaks, seasonality mismatches, staggered adoption, carryover effects, reversion to the mean, donor dominance.
-
Tools such as leave-one-out tests, retuning windows, augmented/ridge/Lasso variants, or switching to ITS/DID if needed.