PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Data Manipulation (SQL/Python)/Amazon

Build DID panel and compute effects in SQL

Last updated: Mar 29, 2026

Quick Overview

This question evaluates proficiency in panel data construction and causal inference workflows, specifically difference-in-differences estimation, SQL aggregation for user-week metrics, and propensity-score matching implemented in Python.

  • Medium
  • Amazon
  • Data Manipulation (SQL/Python)
  • Data Scientist

Build DID panel and compute effects in SQL

Company: Amazon

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: HR Screen

Using the schema and toy data below, write SQL to construct a user-week panel and compute a clean pre/post DID dataset for first reminder exposure. Requirements: (i) Define treat_date as each user’s first reminder_dt; (ii) For users not yet treated, they contribute to controls until their treat_date; (iii) Keep weeks in [treat_date − 8 weeks, treat_date + 8 weeks]; (iv) Exclude users whose household_id has another member treated earlier (to reduce spillovers); (v) Aggregate weekly CSAT (mean) and sessions (sum minutes, sum purchase_flag) per user-week; (vi) Output one row per user-week with columns: user_id, week_start, treated_post (1 if week_start >= treat_week and user has a treat_date, else 0), ever_treated (0/1), household_id, device, csat_mean_wk, minutes_wk, purchases_wk. Then: 1) Write a query to produce group-level pre/post means and the 2x2 DID estimate. 2) In Python, outline code to compute propensity scores (logit) on baseline covariates (device, country, pre-period usage) and perform 1:1 nearest-neighbor matching with caliper=0.05, reporting SMDs before/after matching. Schema: - users(user_id INT, signup_date DATE, country VARCHAR, household_id VARCHAR, device VARCHAR) - subscriptions(sub_id INT, user_id INT, drug_id VARCHAR, start_date DATE, auto_refill TINYINT) - reminders(user_id INT, reminder_dt DATE, channel VARCHAR) - csat(user_id INT, event_dt DATE, csat_score INT) - sessions(user_id INT, session_dt DATE, minutes INT, purchase_flag TINYINT) Sample data (ASCII): users | user_id | signup_date | country | household_id | device | |--------:|-------------|---------|--------------|---------| | 1 | 2025-01-05 | US | H1 | ios | | 2 | 2025-01-07 | US | H1 | android | | 3 | 2025-01-12 | US | H2 | web | | 4 | 2025-01-20 | CN | H3 | ios | subscriptions | sub_id | user_id | drug_id | start_date | auto_refill | |-------:|--------:|---------|------------|-------------| | 10 | 1 | D1 | 2025-01-10 | 1 | | 11 | 2 | D1 | 2025-01-15 | 0 | | 12 | 3 | D2 | 2025-01-20 | 1 | reminders | user_id | reminder_dt | channel | |--------:|-------------|---------| | 1 | 2025-02-01 | push | | 1 | 2025-03-15 | email | | 2 | 2025-03-01 | push | | 3 | 2025-04-01 | sms | csat | user_id | event_dt | csat_score | |--------:|------------|------------| | 1 | 2025-01-25 | 3 | | 1 | 2025-02-08 | 5 | | 2 | 2025-02-20 | 4 | | 2 | 2025-03-10 | 2 | | 3 | 2025-03-15 | 4 | | 3 | 2025-04-10 | 3 | sessions | user_id | session_dt | minutes | purchase_flag | |--------:|------------|--------:|---------------| | 1 | 2025-01-28 | 6 | 0 | | 1 | 2025-02-06 | 12 | 1 | | 2 | 2025-02-22 | 9 | 0 | | 2 | 2025-03-05 | 3 | 0 | | 3 | 2025-03-18 | 7 | 1 | | 3 | 2025-04-09 | 5 | 0 |

Quick Answer: This question evaluates proficiency in panel data construction and causal inference workflows, specifically difference-in-differences estimation, SQL aggregation for user-week metrics, and propensity-score matching implemented in Python.

Related Interview Questions

  • Find recommended friend pairs by shared songs - Amazon (medium)
  • Find recommended friend pairs by shared listening - Amazon (easy)
  • Write SQL window functions for D7 retention - Amazon (medium)
  • Find daily first-order merchants with SQL - Amazon (Medium)
  • Design student–course data models and SQL - Amazon (Medium)
Amazon logo
Amazon
Oct 13, 2025, 9:49 PM
Data Scientist
HR Screen
Data Manipulation (SQL/Python)
2
0

Using the schema and toy data below, write SQL to construct a user-week panel and compute a clean pre/post DID dataset for first reminder exposure. Requirements: (i) Define treat_date as each user’s first reminder_dt; (ii) For users not yet treated, they contribute to controls until their treat_date; (iii) Keep weeks in [treat_date − 8 weeks, treat_date + 8 weeks]; (iv) Exclude users whose household_id has another member treated earlier (to reduce spillovers); (v) Aggregate weekly CSAT (mean) and sessions (sum minutes, sum purchase_flag) per user-week; (vi) Output one row per user-week with columns: user_id, week_start, treated_post (1 if week_start >= treat_week and user has a treat_date, else 0), ever_treated (0/1), household_id, device, csat_mean_wk, minutes_wk, purchases_wk. Then: 1) Write a query to produce group-level pre/post means and the 2x2 DID estimate. 2) In Python, outline code to compute propensity scores (logit) on baseline covariates (device, country, pre-period usage) and perform 1:1 nearest-neighbor matching with caliper=0.05, reporting SMDs before/after matching.

Schema:

  • users(user_id INT, signup_date DATE, country VARCHAR, household_id VARCHAR, device VARCHAR)
  • subscriptions(sub_id INT, user_id INT, drug_id VARCHAR, start_date DATE, auto_refill TINYINT)
  • reminders(user_id INT, reminder_dt DATE, channel VARCHAR)
  • csat(user_id INT, event_dt DATE, csat_score INT)
  • sessions(user_id INT, session_dt DATE, minutes INT, purchase_flag TINYINT)

Sample data (ASCII): users

user_idsignup_datecountryhousehold_iddevice
12025-01-05USH1ios
22025-01-07USH1android
32025-01-12USH2web
42025-01-20CNH3ios

subscriptions

sub_iduser_iddrug_idstart_dateauto_refill
101D12025-01-101
112D12025-01-150
123D22025-01-201

reminders

user_idreminder_dtchannel
12025-02-01push
12025-03-15email
22025-03-01push
32025-04-01sms

csat

user_idevent_dtcsat_score
12025-01-253
12025-02-085
22025-02-204
22025-03-102
32025-03-154
32025-04-103

sessions

user_idsession_dtminutespurchase_flag
12025-01-2860
12025-02-06121
22025-02-2290
22025-03-0530
32025-03-1871
32025-04-0950

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Data Manipulation (SQL/Python)•More Amazon•More Data Scientist•Amazon Data Scientist•Amazon Data Manipulation (SQL/Python)•Data Scientist Data Manipulation (SQL/Python)
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.