PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Data Manipulation (SQL/Python)/DoorDash

Refactor SQL into an aggregated report

Last updated: Mar 29, 2026

Quick Overview

This question evaluates proficiency in SQL-based data transformation and analytical aggregation, covering timezone-aware local date bucketing, deduplication, percentile/ordered-set aggregates, safe rate calculations, and other PostgreSQL 14 features used for reporting.

  • Medium
  • DoorDash
  • Data Manipulation (SQL/Python)
  • Data Scientist

Refactor SQL into an aggregated report

Company: DoorDash

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Technical Screen

You are given the following Postgres schema and small sample data for a food-delivery platform. Schema: - orders(order_id INT PRIMARY KEY, city TEXT, created_at TIMESTAMPTZ, completed_at TIMESTAMPTZ NULL, status TEXT CHECK (status IN ('created','accepted','picked_up','completed','cancelled')), courier_id INT, customer_id INT, subtotal NUMERIC(10,2), tip NUMERIC(10,2), cancel_reason TEXT NULL) - couriers(courier_id INT PRIMARY KEY, region TEXT, is_online BOOLEAN, active_start TIMESTAMPTZ, active_end TIMESTAMPTZ) Sample rows (minimal): orders order_id | city | created_at | completed_at | status | courier_id | subtotal | tip | cancel_reason 1 | LA | 2025-08-31 18:02:00+00 | 2025-08-31 18:35:00+00 | completed | 101 | 25.00 | 5.00| NULL 2 | LA | 2025-08-31 18:10:00+00 | NULL | cancelled | 102 | 12.00 | 0.00| no_courier 3 | LA | 2025-09-01 00:05:00+00 | 2025-09-01 00:25:00+00 | completed | 101 | 30.00 | 3.00| NULL 4 | SF | 2025-08-31 19:40:00+00 | 2025-08-31 20:05:00+00 | completed | 103 | 18.00 | 2.00| NULL couriers courier_id | region | is_online | active_start | active_end 101 | LA | true | 2025-08-31 17:45:00+00 | 2025-08-31 21:00:00+00 102 | LA | true | 2025-08-31 18:00:00+00 | 2025-08-31 19:00:00+00 103 | SF | true | 2025-08-31 19:00:00+00 | 2025-08-31 22:00:00+00 An analyst has this existing query that lists raw orders: SELECT o.order_id, o.city, o.created_at, o.completed_at, o.status, o.subtotal, o.tip, o.cancel_reason FROM orders o WHERE o.created_at >= TIMESTAMPTZ '2025-08-30 00:00:00+00' AND o.city IN ('LA','SF'); Without rewriting from scratch, precisely describe how you would transform the above into an aggregated DAILY city-level report for LA and SF, using PostgreSQL 14 features (e.g., percentile_cont), with the following business rules and outputs. Then provide the final SQL: - Local day should be America/Los_Angeles; bucket by local date derived from created_at. Attribute completions that occur after midnight back to the order's creation local date. - Deduplicate any accidental duplicate rows by order_id, keeping the earliest created_at per order_id. - Metrics per city, per local day: created_orders, completed_orders, completion_rate, cancellations_no_courier_count and rate, median_fulfillment_minutes, p95_fulfillment_minutes, GMV (sum of subtotal+tip for completed orders). - Exclude city-days with created_orders < 20; order results by local_day, city. - Ensure safe divisions (no divide-by-zero) and round rates/percentiles to 2 decimals. Explain each transformation step briefly (CTEs, timezone conversion, deduping, grouping, ordered-set aggregates), then show the final query.

Quick Answer: This question evaluates proficiency in SQL-based data transformation and analytical aggregation, covering timezone-aware local date bucketing, deduplication, percentile/ordered-set aggregates, safe rate calculations, and other PostgreSQL 14 features used for reporting.

Related Interview Questions

  • Calculate Order Request Metrics - DoorDash (hard)
  • Analyze Restaurant Customer Metrics - DoorDash (medium)
  • Write SQL for monthly spend and ratios - DoorDash (medium)
  • Write SQL for late-delivery metrics by window - DoorDash (Medium)
  • Compute rolling cold-delivery rates with windows - DoorDash (Medium)
DoorDash logo
DoorDash
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Data Manipulation (SQL/Python)
9
0

You are given the following Postgres schema and small sample data for a food-delivery platform.

Schema:

  • orders(order_id INT PRIMARY KEY, city TEXT, created_at TIMESTAMPTZ, completed_at TIMESTAMPTZ NULL, status TEXT CHECK (status IN ('created','accepted','picked_up','completed','cancelled')), courier_id INT, customer_id INT, subtotal NUMERIC(10,2), tip NUMERIC(10,2), cancel_reason TEXT NULL)
  • couriers(courier_id INT PRIMARY KEY, region TEXT, is_online BOOLEAN, active_start TIMESTAMPTZ, active_end TIMESTAMPTZ)

Sample rows (minimal): orders order_id | city | created_at | completed_at | status | courier_id | subtotal | tip | cancel_reason 1 | LA | 2025-08-31 18:02:00+00 | 2025-08-31 18:35:00+00 | completed | 101 | 25.00 | 5.00| NULL 2 | LA | 2025-08-31 18:10:00+00 | NULL | cancelled | 102 | 12.00 | 0.00| no_courier 3 | LA | 2025-09-01 00:05:00+00 | 2025-09-01 00:25:00+00 | completed | 101 | 30.00 | 3.00| NULL 4 | SF | 2025-08-31 19:40:00+00 | 2025-08-31 20:05:00+00 | completed | 103 | 18.00 | 2.00| NULL

couriers courier_id | region | is_online | active_start | active_end 101 | LA | true | 2025-08-31 17:45:00+00 | 2025-08-31 21:00:00+00 102 | LA | true | 2025-08-31 18:00:00+00 | 2025-08-31 19:00:00+00 103 | SF | true | 2025-08-31 19:00:00+00 | 2025-08-31 22:00:00+00

An analyst has this existing query that lists raw orders: SELECT o.order_id, o.city, o.created_at, o.completed_at, o.status, o.subtotal, o.tip, o.cancel_reason FROM orders o WHERE o.created_at >= TIMESTAMPTZ '2025-08-30 00:00:00+00' AND o.city IN ('LA','SF');

Without rewriting from scratch, precisely describe how you would transform the above into an aggregated DAILY city-level report for LA and SF, using PostgreSQL 14 features (e.g., percentile_cont), with the following business rules and outputs. Then provide the final SQL:

  • Local day should be America/Los_Angeles; bucket by local date derived from created_at. Attribute completions that occur after midnight back to the order's creation local date.
  • Deduplicate any accidental duplicate rows by order_id, keeping the earliest created_at per order_id.
  • Metrics per city, per local day: created_orders, completed_orders, completion_rate, cancellations_no_courier_count and rate, median_fulfillment_minutes, p95_fulfillment_minutes, GMV (sum of subtotal+tip for completed orders).
  • Exclude city-days with created_orders < 20; order results by local_day, city.
  • Ensure safe divisions (no divide-by-zero) and round rates/percentiles to 2 decimals. Explain each transformation step briefly (CTEs, timezone conversion, deduping, grouping, ordered-set aggregates), then show the final query.

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Data Manipulation (SQL/Python)•More DoorDash•More Data Scientist•DoorDash Data Scientist•DoorDash Data Manipulation (SQL/Python)•Data Scientist Data Manipulation (SQL/Python)
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.