Justify synthetic control and handle inference
Company: Reddit
Role: Data Scientist
Category: Statistics & Math
Difficulty: hard
Interview Round: Technical Screen
Explain the identification assumptions for Synthetic Control and how violations bias estimates: convex-hull/linear-span requirement, no interference between treated and donor units, stability of relationships over time, and reliance on strong pre-period fit to proxy unobserved confounders. Describe principled variable/lag selection without post-treatment leakage, and how you’d regularize when predictors are high-dimensional. Lay out your inference strategy: constructing pointwise and cumulative treatment effects, using in-space and in-time placebo distributions (MSPE ratio) to obtain p-values, building uncertainty bands, and why naive bootstrapping can fail. Discuss diagnostics and fixes for poor pre-period fit, structural breaks, seasonality mismatches, staggered adoption, carryover effects, reversion to the mean, and donor dominance (e.g., leave-one-out tests, retuning windows, augmented/ridge/Lasso variants, or switching to ITS/DID if assumptions don’t hold).
Quick Answer: This question evaluates understanding of synthetic control methodology, causal identification assumptions, estimation and regularization choices, inference techniques for pointwise and cumulative effects, and diagnostic checks for panel time-series interventions.