Validate DID and IV assumptions rigorously
Company: Amazon
Role: Data Scientist
Category: Statistics & Math
Difficulty: hard
Interview Round: HR Screen
1) Derive the 2×2 DID estimand τ = (Ȳ_T,post − Ȳ_T,pre) − (Ȳ_C,post − Ȳ_C,pre) from the parallel trends assumption, and show its equivalence to a TWFE regression with a treatment×post interaction under homogeneous treatment effects. 2) Explain why TWFE is biased with staggered adoption and heterogeneous effects; describe and contrast Sun–Abraham and Callaway–Sant’Anna estimators, and outline how you would compute an event-study with proper cohort weights. 3) State precisely how you would cluster standard errors (and when to use wild cluster bootstrap) given household-level interference and market-level shocks; discuss consequences of too few clusters. 4) Propose a plausibly exogenous instrument for reminder exposure (e.g., exogenous send-throttling or an email provider outage that differentially delayed reminders), write the 2SLS setup (first stage and structural equation), and the GMM moment conditions. 5) Describe tests for instrument strength and validity (Stock–Yogo weak-IV thresholds, first-stage F, Hansen J overidentification) under heteroskedasticity and clustering, and interpret a case where F≈8 and J is insignificant.
Quick Answer: This question evaluates mastery of causal inference and applied econometrics, specifically difference-in-differences, two-way fixed effects, staggered adoption and treatment heterogeneity, instrumental variables (2SLS), clustering, and GMM-based inference.