How do I approach Data Manipulation (SQL/Python) interview questions?

Data Manipulation (SQL/Python) questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master data manipulation (sql/python) interviews.

What difficulty level is this interview question?

This is a Medium difficulty Data Manipulation (SQL/Python) question, commonly asked during HR Screen rounds at Amazon.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Amazon during technical interviews.

Build DID panel and compute effects in SQL

Q: Build DID panel and compute effects in SQL

This question evaluates proficiency in panel data construction and causal inference workflows, specifically difference-in-differences estimation, SQL aggregation for user-week metrics, and propensity-score matching implemented in Python.

Using the schema and toy data below, write SQL to construct a user-week panel and compute a clean pre/post DID dataset for first reminder exposure. Requirements: (i) Define treat_date as each user’s first reminder_dt; (ii) For users not yet treated, they contribute to controls until their treat_date; (iii) Keep weeks in [treat_date − 8 weeks, treat_date + 8 weeks]; (iv) Exclude users whose household_id has another member treated earlier (to reduce spillovers); (v) Aggregate weekly CSAT (mean) and sessions (sum minutes, sum purchase_flag) per user-week; (vi) Output one row per user-week with columns: user_id, week_start, treated_post (1 if week_start >= treat_week and user has a treat_date, else 0), ever_treated (0/1), household_id, device, csat_mean_wk, minutes_wk, purchases_wk. Then: 1) Write a query to produce group-level pre/post means and the 2x2 DID estimate. 2) In Python, outline code to compute propensity scores (logit) on baseline covariates (device, country, pre-period usage) and perform 1:1 nearest-neighbor matching with caliper=0.05, reporting SMDs before/after matching.

Schema:

users(user_id INT, signup_date DATE, country VARCHAR, household_id VARCHAR, device VARCHAR)
subscriptions(sub_id INT, user_id INT, drug_id VARCHAR, start_date DATE, auto_refill TINYINT)
reminders(user_id INT, reminder_dt DATE, channel VARCHAR)
csat(user_id INT, event_dt DATE, csat_score INT)
sessions(user_id INT, session_dt DATE, minutes INT, purchase_flag TINYINT)

Sample data (ASCII): users

user_id	signup_date	country	household_id	device
1	2025-01-05	US	H1	ios
2	2025-01-07	US	H1	android
3	2025-01-12	US	H2	web
4	2025-01-20	CN	H3	ios

subscriptions

sub_id	user_id	drug_id	start_date	auto_refill
10	1	D1	2025-01-10	1
11	2	D1	2025-01-15	0
12	3	D2	2025-01-20	1

reminders

user_id	reminder_dt	channel
1	2025-02-01	push
1	2025-03-15	email
2	2025-03-01	push
3	2025-04-01	sms

csat

user_id	event_dt	csat_score
1	2025-01-25	3
1	2025-02-08	5
2	2025-02-20	4
2	2025-03-10	2
3	2025-03-15	4
3	2025-04-10	3

sessions

user_id	session_dt	minutes	purchase_flag
1	2025-01-28	6	0
1	2025-02-06	12	1
2	2025-02-22	9	0
2	2025-03-05	3	0
3	2025-03-18	7	1
3	2025-04-09	5	0

Schema:

users(user_id INT, signup_date DATE, country VARCHAR, household_id VARCHAR, device VARCHAR)
subscriptions(sub_id INT, user_id INT, drug_id VARCHAR, start_date DATE, auto_refill TINYINT)
reminders(user_id INT, reminder_dt DATE, channel VARCHAR)
csat(user_id INT, event_dt DATE, csat_score INT)
sessions(user_id INT, session_dt DATE, minutes INT, purchase_flag TINYINT)

Sample data (ASCII): users

user_id	signup_date	country	household_id	device
1	2025-01-05	US	H1	ios
2	2025-01-07	US	H1	android
3	2025-01-12	US	H2	web
4	2025-01-20	CN	H3	ios

subscriptions

sub_id	user_id	drug_id	start_date	auto_refill
10	1	D1	2025-01-10	1
11	2	D1	2025-01-15	0
12	3	D2	2025-01-20	1

reminders

user_id	reminder_dt	channel
1	2025-02-01	push
1	2025-03-15	email
2	2025-03-01	push
3	2025-04-01	sms

csat

user_id	event_dt	csat_score
1	2025-01-25	3
1	2025-02-08	5
2	2025-02-20	4
2	2025-03-10	2
3	2025-03-15	4
3	2025-04-10	3

sessions

user_id	session_dt	minutes	purchase_flag
1	2025-01-28	6	0
1	2025-02-06	12	1
2	2025-02-22	9	0
2	2025-03-05	3	0
3	2025-03-18	7	1
3	2025-04-09	5	0

Build DID panel and compute effects in SQL

Quick Overview

Build DID panel and compute effects in SQL

Write your answer

Build DID panel and compute effects in SQL

Quick Overview

Build DID panel and compute effects in SQL

Write your answer