This question evaluates a candidate's competency in causal inference and experimental analytics, covering staggered-adoption difference-in-differences design, estimand and outcome definition, parallel-trends diagnostics, selection and time-varying confounding considerations, clustering and weighting choices, handling varied adoption timing and site attrition, robustness checks, and communication of coefficient interpretation and uncertainty. It is commonly asked in the Analytics & Experimentation domain because it tests both conceptual understanding of identification assumptions and practical application of statistical design choices for real-world observational causal analysis.

You have individual-level data from 1,000+ sites, several hundred of which adopt a free employee shuttle at different times. Design a causal analysis to estimate the shuttle’s impact on employee participation/engagement. Specify: 1) the primary estimand (e.g., ATT) and outcome definitions; 2) an identification strategy using staggered-adoption difference-in-differences with appropriate fixed effects; 3) how you will check parallel trends (event-study, placebo on pre-periods, leads/lags) and handle violations; 4) how you will address selection into treatment (site readiness, commuting patterns) and time-varying confounders; 5) clustering/weighting choices and why; 6) how you will handle sites that never adopt, late adopters, and site closures; 7) robustness checks (stacked DiD, alternative windows, alternative outcomes, leave-one-site-out); 8) how you will communicate coefficient interpretation and uncertainty to non-technical stakeholders.