Build panel in SQL; run causal regression

Q: How do I practice SQL interview questions?

PracHub provides an interactive SQL console where you can write and test queries against real database schemas. Get instant feedback and compare your solution with the expected output.

Q: What difficulty level is this coding question?

This is a Medium difficulty Data Manipulation (SQL/Python) question, commonly asked during Technical Screen rounds at Airbnb.

Q: What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Airbnb during technical interviews.

Question

Assume today is 2025-09-01 (UTC). Schema and small samples:

users(user_id INT, country STRING, signup_date DATE, platform STRING)
Sample:
user_id | country | signup_date | platform
1 | US | 2025-08-25 | ios
2 | US | 2025-08-20 | android
3 | CA | 2025-07-10 | web
4 | US | 2025-08-31 | ios
5 | GB | 2025-06-15 | android

exposures(user_id INT, ts TIMESTAMP, treatment INT)
Sample:
user_id | ts | treatment
1 | 2025-08-30T10:00:00Z | 1
1 | 2025-09-01T09:00:00Z | 1
2 | 2025-08-29T12:00:00Z | 0
3 | 2025-08-28T08:00:00Z | 1
5 | 2025-08-26T14:00:00Z | 0

orders(order_id INT, user_id INT, ts TIMESTAMP, amount DECIMAL(10,2))
Sample:
order_id | user_id | ts | amount
10 | 1 | 2025-08-30T11:00:00Z | 25.00
11 | 2 | 2025-08-29T13:00:00Z | 9.99
12 | 3 | 2025-08-31T09:00:00Z | 12.00
13 | 1 | 2025-09-01T10:15:00Z | 5.00
14 | 4 | 2025-09-01T16:00:00Z | 20.00

geo_rollout(country STRING, launch_ts TIMESTAMP)
Sample:
country | launch_ts
US | 2025-08-29T00:00:00Z
CA | 2025-08-27T00:00:00Z
GB | 2025-08-31T00:00:00Z

Tasks:
SQL A: Build a user-day panel for dates d in [2025-08-25, 2025-09-01]. For each user_id and date d, output: date, user_id, country, treated (1 if EXISTS exposure with ts <= d 23:59:59Z and treatment=1), post (1 if date(d) >= date(launch_ts for user’s country)), revenue_d (sum of order amounts with ts on d), active_d (1 if revenue_d > 0 OR EXISTS exposure on d), and signup_age_days. Ensure no duplicate user-days and fill missing user-days with zeros where appropriate.
SQL B: Using the panel, compute for US and GB the 7-day rolling revenue per active user on each day d in [2025-08-26, 2025-09-01], where active user means active_d=1 within the 7-day window. Clarify how you handle users with zero activity and avoid lookahead bias.
Python: From the panel, estimate a two-way fixed-effects DiD: revenue_d ~ post * treated + user FE + day FE, clustering SEs at the user level. (1) Interpret the interaction coefficient economically; (2) test for pre-trends using leads; (3) address late adopters/immortal time bias; (4) re-estimate with a count model (Poisson with exposure offset) and compare; (5) report a 95% CI and a practical recommendation.

PracHub · Accepted Answer

This question evaluates proficiency in SQL-based panel data construction, time-series aggregation and rolling-window metrics together with applied causal inference in Python, including two-way fixed-effects difference-in-differences, clustered standard errors, lead tests, and count-model re-specification; it targets Data Manipulation (SQL/Python) skills relevant for Data Scientist roles. It is commonly asked because it combines practical implementation and conceptual understanding—testing temporal joins, treatment timing, avoidance of lookahead/immortal-time biases and late-adopter issues, interpretation of interaction coefficients and confidence intervals, and the ability to reason about identification and model choice.

Quick Overview

Quick Overview