Design causal study for reminder impact
Company: Amazon
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: hard
Interview Round: HR Screen
You cannot randomize who receives medication-subscription reminders; product launches were staggered by market and channel (push/email/SMS). Outcome is user experience (CSAT 1–5) and 4-week retention; adoption may spill over within households. Design an observational causal study. Answer precisely: 1) State your primary identification strategy and model (e.g., staggered-adoption DID with event study) and write the regression you would estimate, including fixed effects, time trends/seasonality, and how you cluster standard errors. 2) Define treatment and risk sets to avoid immortal-time bias, and explain how you handle not-yet-treated users. 3) Specify pre-trend diagnostics, how you would detect/mitigate treatment effect heterogeneity bias in TWFE, and which modern DID estimator (e.g., Sun–Abraham or Callaway–Sant’Anna) you would use and why. 4) Lay out a matching or weighting backup plan (e.g., PSM or overlap weighting): covariates needed, caliper/ratio, balance metrics and thresholds. 5) Propose two negative controls (an outcome and an exposure) and one falsification test. 6) Address household spillovers/interference, missing CSAT, and channel selection (users can opt out of push). 7) Provide a minimal power/MDE calculation outline with assumptions on baseline variance, intra-household correlation, and expected adoption rate.
Quick Answer: This question evaluates a data scientist's competency in observational causal inference and program evaluation, covering concepts such as staggered rollouts, treatment definition, difference-in-differences diagnostics, matching/weighting, negative controls, interference handling, and power/MDE planning.