You work at a ride-hailing company and are analyzing orders (trips) between drivers and riders.
Tables
drivers
-
driver_id
BIGINT (PK)
-
home_city
TEXT
-
created_at
TIMESTAMP
riders
-
rider_id
BIGINT (PK)
-
home_city
TEXT
-
created_at
TIMESTAMP
orders
-
order_id
BIGINT (PK)
-
driver_id
BIGINT (FK →
drivers.driver_id
, nullable if unassigned)
-
rider_id
BIGINT (FK →
riders.rider_id
)
-
requested_at
TIMESTAMP
-
accepted_at
TIMESTAMP (nullable)
-
completed_at
TIMESTAMP (nullable)
-
canceled_at
TIMESTAMP (nullable)
-
status
TEXT
-
one of:
'requested' | 'accepted' | 'completed' | 'canceled'
-
canceled_by
TEXT (nullable)
-
one of:
'rider' | 'driver' | 'system'
-
cancel_reason
TEXT (nullable)
-
pickup_type
TEXT
-
airport_code
TEXT (nullable)
-
fare_usd
NUMERIC (nullable)
Assumptions
-
Use the timestamps as stored (no timezone conversion needed).
-
An order is considered
canceled
if
status='canceled'
(equivalently
canceled_at IS NOT NULL
).
-
An order is considered
completed
if
status='completed'
.
Tasks (SQL)
Write SQL queries for the following. Each query should clearly state any additional assumptions you make.
-
Top drivers by completed orders
Return the
top 10 drivers
by number of
completed
orders in the
last 30 days
.
-
Output columns:
driver_id
,
completed_orders_30d
-
Cancellation rate with filters
Compute
cancellation rate
for
airport pickups only
in the last 30 days, broken out by who canceled.
-
Define cancellation rate as:
canceled_orders / total_requested_orders
within the segment.
-
Output columns:
canceled_by
(include an explicit bucket like
'not_canceled'
if you choose),
total_orders
,
canceled_orders
,
cancellation_rate
-
Top riders by cancellations
Return the top 10 riders with the highest number of canceled orders (airport pickups only, last 30 days).
-
Output columns:
rider_id
,
canceled_orders_30d
Note: Many candidates make mistakes around filtering joined tables (e.g., conditions placed in WHERE vs ON). Write queries that handle nulls/joins correctly.