Scenario
You have panel data for 1,000+ sites over time (e.g., monthly). Some sites adopt free shuttle buses at different dates, while others never adopt. The goal is to estimate the causal effect of offering shuttle service on employee participation rates.
Questions
-
Data grain: Would you analyze at the site level or the individual level? Why?
-
OLS baseline
-
Write an OLS regression equation to estimate the shuttle effect.
-
List key controls you would include and why.
-
Explain how to interpret the shuttle coefficient.
-
OLS limitations and DiD
-
What limitations does basic OLS have here?
-
How would a Difference-in-Differences (DiD) design address them? Provide a TWFE/event-study specification.
-
Placebo test for DiD
-
Design a placebo test to assess the DiD identifying assumption (parallel trends / no anticipatory effects).
-
If DiD assumptions fail: Propensity Score Matching (PSM)
-
Outline how you would apply PSM in this setting.
-
Which variables would you include and why?
-
Provide intuition for why Lasso helps with feature selection when estimating propensity scores.
-
Discuss trade-offs of matching with vs. without replacement and how to run balance checks.
-
After matching
-
What is the next analytical step (e.g., PSM + DiD), and how would you summarize and report results to stakeholders?