Causal Inference And Quasi-Experiments
Asked of: Data Scientist
Last updated
-
What's being tested — Ability to pick and defend an identification strategy under imperfect randomization, state and test key assumptions, implement quasi-experimental estimators, and interpret causal estimates and robustness checks.
-
Core knowledge
- Difference-in-differences (DiD): parallel trends assumption and event-study pre-trends test.
- Instrumental variables (IV): relevance, exclusion, monotonicity, and interpretation as LATE.
- Regression discontinuity (RD): sharp vs fuzzy, local randomization, bandwidth tradeoffs.
- Propensity scores: overlap, weighting (IPW), and balance diagnostics versus matching pitfalls.
- Synthetic control: constructing counterfactual from weighted donors for aggregate units.
- Staggered adoption: two-way FE bias, use Goodman-Bacon decomposition or Callaway & Sant'Anna estimators.
- Inference: cluster-robust SEs at treatment assignment level, permutation/placebo tests.
-
Worked example — "Estimate impact of a non-random regional rollout (Difference-in-Differences frame)" First define treatment timing, treated units, and outcome metric (e.g., DAU change). Check pre-treatment trends with an event-study; if parallel trends hold, estimate DiD with region and time fixed effects, clustering SEs at region level. If rollout is staggered, avoid naive TWFE: run Goodman-Bacon decomposition or use Callaway & Sant'Anna to get group-time average treatment effects. Finally, report ATT, plot event-study coefficients, and run falsification tests (lead coefficients, unaffected outcomes).
-
A common pitfall — It's tempting to run a simple two-way fixed effects regression and declare causality. With staggered rollouts or heterogeneous effects, TWFE can produce biased or sign-flipped estimates due to negative weighting. Also avoid conditioning on post-treatment variables or failing to cluster at the assignment level; both undermine identification and inference.
-
Further reading
- Angrist & Pischke, "Mostly Harmless Econometrics" (2009) — concise methods and assumptions for IV, DiD, RD.
- Goodman-Bacon, "Difference-in-Differences with Variation in Treatment Timing" (2021) — decomposition showing TWFE biases and guidance for staggered designs.
Related concepts
- Causal Inference And Quasi-ExperimentsAnalytics & Experimentation
- Causal InferenceAnalytics & Experimentation
- Difference-In-Differences And Quasi-ExperimentsAnalytics & Experimentation
- Causal Inference And ConfoundingStatistics & Math
- Causal Inference And Incrementality
- Statistical Inference For Experiments