Compute top orders and cancellation rate

Q: Compute top orders and cancellation rate

This question evaluates SQL and data-manipulation skills including aggregations, joins, time-based filtering, null-safe conditional logic, and event classification (completed vs. canceled) in a transactional dataset, with the domain specified as Data Manipulation (SQL/Python) for analytics on ride-hailing orders.

Q: How do I approach Data Manipulation (SQL/Python) interview questions?

Data Manipulation (SQL/Python) questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master data manipulation (sql/python) interviews.

Question

You work at a ride-hailing company and are analyzing orders (trips) between drivers and riders.

Tables

`drivers`

driver_id BIGINT (PK)
home_city TEXT
created_at TIMESTAMP

`riders`

rider_id BIGINT (PK)
home_city TEXT
created_at TIMESTAMP

`orders`

order_id BIGINT (PK)
driver_id BIGINT (FK → drivers.driver_id , nullable if unassigned)
rider_id BIGINT (FK → riders.rider_id )
requested_at TIMESTAMP
accepted_at TIMESTAMP (nullable)
completed_at TIMESTAMP (nullable)
canceled_at TIMESTAMP (nullable)
status TEXT
- one of: 'requested' | 'accepted' | 'completed' | 'canceled'
canceled_by TEXT (nullable)
- one of: 'rider' | 'driver' | 'system'
cancel_reason TEXT (nullable)
pickup_type TEXT
- e.g. 'airport' | 'city'
airport_code TEXT (nullable)
fare_usd NUMERIC (nullable)

Assumptions

Use the timestamps as stored (no timezone conversion needed).
An order is considered canceled if status='canceled' (equivalently canceled_at IS NOT NULL ).
An order is considered completed if status='completed' .

Tasks (SQL)

Write SQL queries for the following. Each query should clearly state any additional assumptions you make.

Top drivers by completed orders
Return the top 10 drivers by number of completed orders in the last 30 days .
- Output columns: driver_id , completed_orders_30d
Cancellation rate with filters
Compute cancellation rate for airport pickups only in the last 30 days, broken out by who canceled.
- Define cancellation rate as:
  canceled_orders / total_requested_orders within the segment.
- Output columns: canceled_by (include an explicit bucket like 'not_canceled' if you choose), total_orders , canceled_orders , cancellation_rate
Top riders by cancellations
Return the top 10 riders with the highest number of canceled orders (airport pickups only, last 30 days).
- Output columns: rider_id , canceled_orders_30d

Note: Many candidates make mistakes around filtering joined tables (e.g., conditions placed in WHERE vs ON). Write queries that handle nulls/joins correctly.