Build DiD dataset with SQL

Q: Build DiD dataset with SQL

This question evaluates proficiency in SQL-based data manipulation and panel construction for staggered-adoption difference-in-differences, testing skills in handling adoption dates, event-time calculations, boundary conditions, and aggregation in the Data Manipulation (SQL/Python) domain.

Q: How do I approach Data Manipulation (SQL/Python) interview questions?

Data Manipulation (SQL/Python) questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master data manipulation (sql/python) interviews.

Question

Using the schema and sample data below, write SQL to build an individual-day panel suitable for staggered-adoption DiD of the shuttle’s effect on participation. Requirements: A) output columns: employee_id, site_id, date, participated, adoption_date (per site), treated_site (1 if date >= adoption_date at that site, else 0; never-treated have NULL adoption_date and 0), event_time_days = date - adoption_date (NULL for never-treated), and a binary post indicator; B) ensure no off-by-one errors on the adoption boundary; C) also produce a weekly site-level table with participation_rate = avg(participated) per site-week, correctly handling sites without shuttle; D) assume employees do not move sites. Provide SQL that works on a modern warehouse (e.g., BigQuery or PostgreSQL). Schema:

sites(site_id, city) employees(employee_id, site_id, hire_date) shuttle_service(site_id, start_date) -- present only for treated sites participation(employee_id, date, participated)

Sample tables:

sites +---------+------+ | site_id | city | +---------+------+ | 1 | SEA | | 2 | NYC | +---------+------+

employees +-------------+---------+------------+ | employee_id | site_id | hire_date | +-------------+---------+------------+ | 101 | 1 | 2024-10-01 | | 102 | 1 | 2025-01-01 | | 201 | 2 | 2024-11-15 | +-------------+---------+------------+

shuttle_service +---------+------------+ | site_id | start_date | +---------+------------+ | 1 | 2025-01-15 | +---------+------------+

participation +-------------+------------+--------------+ | employee_id | date | participated | +-------------+------------+--------------+ | 101 | 2025-01-10 | 1 | | 101 | 2025-01-20 | 1 | | 102 | 2025-01-20 | 0 | | 201 | 2025-01-10 | 1 | | 201 | 2025-01-20 | 0 | +-------------+------------+--------------+

Build DiD dataset with SQL

Overview

Comments (0)