This question evaluates proficiency in panel data construction and causal inference workflows, specifically difference-in-differences estimation, SQL aggregation for user-week metrics, and propensity-score matching implemented in Python.
Using the schema and toy data below, write SQL to construct a user-week panel and compute a clean pre/post DID dataset for first reminder exposure. Requirements: (i) Define treat_date as each user’s first reminder_dt; (ii) For users not yet treated, they contribute to controls until their treat_date; (iii) Keep weeks in [treat_date − 8 weeks, treat_date + 8 weeks]; (iv) Exclude users whose household_id has another member treated earlier (to reduce spillovers); (v) Aggregate weekly CSAT (mean) and sessions (sum minutes, sum purchase_flag) per user-week; (vi) Output one row per user-week with columns: user_id, week_start, treated_post (1 if week_start >= treat_week and user has a treat_date, else 0), ever_treated (0/1), household_id, device, csat_mean_wk, minutes_wk, purchases_wk. Then: 1) Write a query to produce group-level pre/post means and the 2x2 DID estimate. 2) In Python, outline code to compute propensity scores (logit) on baseline covariates (device, country, pre-period usage) and perform 1:1 nearest-neighbor matching with caliper=0.05, reporting SMDs before/after matching.
Schema:
Sample data (ASCII): users
| user_id | signup_date | country | household_id | device |
|---|---|---|---|---|
| 1 | 2025-01-05 | US | H1 | ios |
| 2 | 2025-01-07 | US | H1 | android |
| 3 | 2025-01-12 | US | H2 | web |
| 4 | 2025-01-20 | CN | H3 | ios |
subscriptions
| sub_id | user_id | drug_id | start_date | auto_refill |
|---|---|---|---|---|
| 10 | 1 | D1 | 2025-01-10 | 1 |
| 11 | 2 | D1 | 2025-01-15 | 0 |
| 12 | 3 | D2 | 2025-01-20 | 1 |
reminders
| user_id | reminder_dt | channel |
|---|---|---|
| 1 | 2025-02-01 | push |
| 1 | 2025-03-15 | |
| 2 | 2025-03-01 | push |
| 3 | 2025-04-01 | sms |
csat
| user_id | event_dt | csat_score |
|---|---|---|
| 1 | 2025-01-25 | 3 |
| 1 | 2025-02-08 | 5 |
| 2 | 2025-02-20 | 4 |
| 2 | 2025-03-10 | 2 |
| 3 | 2025-03-15 | 4 |
| 3 | 2025-04-10 | 3 |
sessions
| user_id | session_dt | minutes | purchase_flag |
|---|---|---|---|
| 1 | 2025-01-28 | 6 | 0 |
| 1 | 2025-02-06 | 12 | 1 |
| 2 | 2025-02-22 | 9 | 0 |
| 2 | 2025-03-05 | 3 | 0 |
| 3 | 2025-03-18 | 7 | 1 |
| 3 | 2025-04-09 | 5 | 0 |