PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches

Quick Overview

This question evaluates proficiency in SQL-based panel data construction, time-series aggregation and rolling-window metrics together with applied causal inference in Python, including two-way fixed-effects difference-in-differences, clustered standard errors, lead tests, and count-model re-specification; it targets Data Manipulation (SQL/Python) skills relevant for Data Scientist roles. It is commonly asked because it combines practical implementation and conceptual understanding—testing temporal joins, treatment timing, avoidance of lookahead/immortal-time biases and late-adopter issues, interpretation of interaction coefficients and confidence intervals, and the ability to reason about identification and model choice.

  • Medium
  • Airbnb
  • Data Manipulation (SQL/Python)
  • Data Scientist

Build panel in SQL; run causal regression

Company: Airbnb

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Technical Screen

Assume today is 2025-09-01 (UTC). Schema and small samples: users(user_id INT, country STRING, signup_date DATE, platform STRING) Sample: user_id | country | signup_date | platform 1 | US | 2025-08-25 | ios 2 | US | 2025-08-20 | android 3 | CA | 2025-07-10 | web 4 | US | 2025-08-31 | ios 5 | GB | 2025-06-15 | android exposures(user_id INT, ts TIMESTAMP, treatment INT) Sample: user_id | ts | treatment 1 | 2025-08-30T10:00:00Z | 1 1 | 2025-09-01T09:00:00Z | 1 2 | 2025-08-29T12:00:00Z | 0 3 | 2025-08-28T08:00:00Z | 1 5 | 2025-08-26T14:00:00Z | 0 orders(order_id INT, user_id INT, ts TIMESTAMP, amount DECIMAL(10,2)) Sample: order_id | user_id | ts | amount 10 | 1 | 2025-08-30T11:00:00Z | 25.00 11 | 2 | 2025-08-29T13:00:00Z | 9.99 12 | 3 | 2025-08-31T09:00:00Z | 12.00 13 | 1 | 2025-09-01T10:15:00Z | 5.00 14 | 4 | 2025-09-01T16:00:00Z | 20.00 geo_rollout(country STRING, launch_ts TIMESTAMP) Sample: country | launch_ts US | 2025-08-29T00:00:00Z CA | 2025-08-27T00:00:00Z GB | 2025-08-31T00:00:00Z Tasks: SQL A: Build a user-day panel for dates d in [2025-08-25, 2025-09-01]. For each user_id and date d, output: date, user_id, country, treated (1 if EXISTS exposure with ts <= d 23:59:59Z and treatment=1), post (1 if date(d) >= date(launch_ts for user’s country)), revenue_d (sum of order amounts with ts on d), active_d (1 if revenue_d > 0 OR EXISTS exposure on d), and signup_age_days. Ensure no duplicate user-days and fill missing user-days with zeros where appropriate. SQL B: Using the panel, compute for US and GB the 7-day rolling revenue per active user on each day d in [2025-08-26, 2025-09-01], where active user means active_d=1 within the 7-day window. Clarify how you handle users with zero activity and avoid lookahead bias. Python: From the panel, estimate a two-way fixed-effects DiD: revenue_d ~ post * treated + user FE + day FE, clustering SEs at the user level. (1) Interpret the interaction coefficient economically; (2) test for pre-trends using leads; (3) address late adopters/immortal time bias; (4) re-estimate with a count model (Poisson with exposure offset) and compare; (5) report a 95% CI and a practical recommendation.

Quick Answer: This question evaluates proficiency in SQL-based panel data construction, time-series aggregation and rolling-window metrics together with applied causal inference in Python, including two-way fixed-effects difference-in-differences, clustered standard errors, lead tests, and count-model re-specification; it targets Data Manipulation (SQL/Python) skills relevant for Data Scientist roles. It is commonly asked because it combines practical implementation and conceptual understanding—testing temporal joins, treatment timing, avoidance of lookahead/immortal-time biases and late-adopter issues, interpretation of interaction coefficients and confidence intervals, and the ability to reason about identification and model choice.

Last updated: Mar 29, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Compute C/T metrics from bookings and visits - Airbnb (Medium)
  • Compute browsing metrics in Python from logs - Airbnb (Medium)
  • Review a geospatial Python module - Airbnb (Medium)
  • Aggregate User Activity, Fit Regression, Interpret Coefficients - Airbnb (Medium)