This question evaluates a data scientist's skills in causal inference for observational studies, focusing on identification strategy, confounding control, handling staggered adoption, estimator selection (e.g., propensity-score approaches and event-study difference-in-differences), diagnostics, and sensitivity analysis.
A new, optional smart speaker is released. Some users purchase and link the speaker to their account; others do not. Users self-select into purchasing, and you cannot run a forced A/B test.
Assume you have: (a) user-level panel data (daily/weekly) on engagement (e.g., streaming hours, sessions), (b) a time-stamped indicator of speaker activation/ownership, (c) rich covariates (prior engagement levels and trends, demographics/geo, devices, marketing exposure), and (d) staggered adoption timing across users.
How would you measure the causal impact of owning the speaker on engagement? Describe your preferred design and why.
In your answer, specify:
You may discuss quasi-experimental approaches such as difference-in-differences (event study), propensity-score matching/weighting, instrumental variables, and pre-post checks.
Login required