PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/Data Manipulation (SQL/Python)/Meta

Build DiD dataset with SQL

Last updated: Mar 29, 2026

Quick Overview

This question evaluates proficiency in SQL-based data manipulation and panel construction for staggered-adoption difference-in-differences, testing skills in handling adoption dates, event-time calculations, boundary conditions, and aggregation in the Data Manipulation (SQL/Python) domain.

  • Medium
  • Meta
  • Data Manipulation (SQL/Python)
  • Data Scientist

Build DiD dataset with SQL

Company: Meta

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Technical Screen

Using the schema and sample data below, write SQL to build an individual-day panel suitable for staggered-adoption DiD of the shuttle’s effect on participation. Requirements: A) output columns: employee_id, site_id, date, participated, adoption_date (per site), treated_site (1 if date >= adoption_date at that site, else 0; never-treated have NULL adoption_date and 0), event_time_days = date - adoption_date (NULL for never-treated), and a binary post indicator; B) ensure no off-by-one errors on the adoption boundary; C) also produce a weekly site-level table with participation_rate = avg(participated) per site-week, correctly handling sites without shuttle; D) assume employees do not move sites. Provide SQL that works on a modern warehouse (e.g., BigQuery or PostgreSQL). Schema: sites(site_id, city) employees(employee_id, site_id, hire_date) shuttle_service(site_id, start_date) -- present only for treated sites participation(employee_id, date, participated) Sample tables: sites +---------+------+ | site_id | city | +---------+------+ | 1 | SEA | | 2 | NYC | +---------+------+ employees +-------------+---------+------------+ | employee_id | site_id | hire_date | +-------------+---------+------------+ | 101 | 1 | 2024-10-01 | | 102 | 1 | 2025-01-01 | | 201 | 2 | 2024-11-15 | +-------------+---------+------------+ shuttle_service +---------+------------+ | site_id | start_date | +---------+------------+ | 1 | 2025-01-15 | +---------+------------+ participation +-------------+------------+--------------+ | employee_id | date | participated | +-------------+------------+--------------+ | 101 | 2025-01-10 | 1 | | 101 | 2025-01-20 | 1 | | 102 | 2025-01-20 | 0 | | 201 | 2025-01-10 | 1 | | 201 | 2025-01-20 | 0 | +-------------+------------+--------------+

Quick Answer: This question evaluates proficiency in SQL-based data manipulation and panel construction for staggered-adoption difference-in-differences, testing skills in handling adoption dates, event-time calculations, boundary conditions, and aggregation in the Data Manipulation (SQL/Python) domain.

Related Interview Questions

  • Compute ad impression conversion rates - Meta (medium)
  • Count unconnected posts and reactions - Meta (medium)
  • Count heavy callers in 7 days - Meta (medium)
  • Write SQL for call metrics - Meta (medium)
  • Write SQL for multi-account metrics - Meta (medium)
Meta logo
Meta
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Data Manipulation (SQL/Python)
3
0

Using the schema and sample data below, write SQL to build an individual-day panel suitable for staggered-adoption DiD of the shuttle’s effect on participation. Requirements: A) output columns: employee_id, site_id, date, participated, adoption_date (per site), treated_site (1 if date >= adoption_date at that site, else 0; never-treated have NULL adoption_date and 0), event_time_days = date - adoption_date (NULL for never-treated), and a binary post indicator; B) ensure no off-by-one errors on the adoption boundary; C) also produce a weekly site-level table with participation_rate = avg(participated) per site-week, correctly handling sites without shuttle; D) assume employees do not move sites. Provide SQL that works on a modern warehouse (e.g., BigQuery or PostgreSQL). Schema:

sites(site_id, city) employees(employee_id, site_id, hire_date) shuttle_service(site_id, start_date) -- present only for treated sites participation(employee_id, date, participated)

Sample tables:

sites +---------+------+ | site_id | city | +---------+------+ | 1 | SEA | | 2 | NYC | +---------+------+

employees +-------------+---------+------------+ | employee_id | site_id | hire_date | +-------------+---------+------------+ | 101 | 1 | 2024-10-01 | | 102 | 1 | 2025-01-01 | | 201 | 2 | 2024-11-15 | +-------------+---------+------------+

shuttle_service +---------+------------+ | site_id | start_date | +---------+------------+ | 1 | 2025-01-15 | +---------+------------+

participation +-------------+------------+--------------+ | employee_id | date | participated | +-------------+------------+--------------+ | 101 | 2025-01-10 | 1 | | 101 | 2025-01-20 | 1 | | 102 | 2025-01-20 | 0 | | 201 | 2025-01-10 | 1 | | 201 | 2025-01-20 | 0 | +-------------+------------+--------------+

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Data Manipulation (SQL/Python)•More Meta•More Data Scientist•Meta Data Scientist•Meta Data Manipulation (SQL/Python)•Data Scientist Data Manipulation (SQL/Python)
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.