Write SQL for percent and window changes

Q: Write SQL for percent and window changes

This is a Data Manipulation (SQL/Python) interview question from DoorDash for Data Scientist roles. View the full question and solution on PracHub.

Q: How do I approach Data Manipulation (SQL/Python) interview questions?

Data Manipulation (SQL/Python) questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master data manipulation (sql/python) interviews.

Question

Use PostgreSQL. Assume today = 2025-09-01. You must use CTEs and multiple window functions. Schema and tiny samples are below.

Schema:

exposures(unit_id INT, dt DATE, condition_label INT) -- 1=treatment, 0=control
orders(order_id INT, unit_id INT, dt DATE, courier_type TEXT, temp_category TEXT, subtotal_cents INT)

Sample rows: exposures +---------+------------+----------------+ | unit_id | dt | condition_label| +---------+------------+----------------+ | 101 | 2025-08-30 | 1 | | 101 | 2025-08-31 | 1 | | 101 | 2025-09-01 | 0 | | 102 | 2025-08-30 | 0 | | 102 | 2025-08-31 | 1 | | 102 | 2025-09-01 | 1 | | 103 | 2025-08-31 | 0 | | 103 | 2025-09-01 | 0 | +---------+------------+----------------+

orders +----------+---------+------------+--------------+---------------+----------------+ | order_id | unit_id | dt | courier_type | temp_category | subtotal_cents | +----------+---------+------------+--------------+---------------+----------------+ | 1 | 101 | 2025-08-30 | biker | cold | 1800 | | 2 | 101 | 2025-08-30 | biker | hot | 2400 | | 3 | 101 | 2025-08-31 | car | cold | 2200 | | 4 | 101 | 2025-09-01 | biker | cold | 2600 | | 5 | 102 | 2025-08-30 | biker | cold | 2100 | | 6 | 102 | 2025-08-31 | biker | hot | 1500 | | 7 | 102 | 2025-09-01 | car | cold | 2000 | | 8 | 103 | 2025-08-31 | biker | cold | 900 | | 9 | 103 | 2025-09-01 | car | hot | 3000 | | 10 | 103 | 2025-08-28 | biker | cold | 1700 | +----------+---------+------------+--------------+---------------+----------------+

Tasks:

Non-aggregated condition percent (simple condition for numerator): Define is_high_value := (subtotal_cents >= 2000). For the 7-day window [2025-08-26, 2025-09-01], compute the percentage of orders that satisfy (courier_type='biker' AND temp_category='cold' AND is_high_value). Produce two variants of the denominator: V1 includes only orders whose unit_id had condition_label=1 on that same dt; V2 includes orders whose unit_id had condition_label in {0,1} on that dt (treat missing exposure rows as condition_label=0). Output both percentages with clearly labeled columns.
Aggregation-required condition percent: For each unit_id×dt in the same 7-day window, define is_power_day := (count of orders in the last 7 days inclusive for that unit with courier_type='biker' AND temp_category='cold' >= 2) AND (average subtotal_cents on those qualifying orders in the same 7-day window >= 2000). Compute the percentage of unit-days where is_power_day=TRUE under the same two denominator conventions as in (1): DV1 counts only unit-days with condition_label=1; DV2 counts unit-days with condition_label in {0,1}. Return a single row with DV1_percent and DV2_percent.
Multiple window frames and change: Build a unit-day series from the orders table (days with 0 orders must appear). For each unit_id and dt in [2025-08-26, 2025-09-01], compute: (a) orders_day := total orders that day; (b) cum_orders := SUM(orders_day) OVER (PARTITION BY unit_id ORDER BY dt ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW); (c) rolling7 := SUM(orders_day) OVER (PARTITION BY unit_id ORDER BY dt ROWS BETWEEN 6 PRECEDING AND CURRENT ROW); (d) pct_change_do_d := (orders_day - LAG(orders_day) OVER w1) / NULLIF(LAG(orders_day) OVER w1,0), where w1=(PARTITION BY unit_id ORDER BY dt); (e) non-overlapping_7day_change := (rolling7 - LAG(rolling7) OVER w7) / NULLIF(LAG(rolling7) OVER w7,0), where w7 uses ORDER BY dt with a frame ROWS BETWEEN 7 PRECEDING AND 1 PRECEDING to represent the prior 7 days. Return one row per unit_id×dt with these columns.
Within-unit state changes vs partition: For each unit_id, define had_high_value_biker_cold_today := (exists an order on dt with courier_type='biker' AND temp_category='cold' AND subtotal_cents>=2000). For each change point where this boolean flips (0→1 or 1→0), output unit_id, dt_of_change, direction, delta_rolling7 := rolling7_after - rolling7_before (rolling7 defined on orders_day), and the percentile_rank of delta_rolling7 among all change deltas across units in the window using PERCENT_RANK() OVER (ORDER BY delta_rolling7).

Write SQL for percent and window changes

Comments (0)