Calculate Average Rate and Cumulative Jobs by State
Company: CloudTrucks
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
jobs
+--------+---------+-----------+------------+----------------+-------+----------------+---------------+
| job_id | load_id | driver_id | status | delivered_date | miles | payment_issued | rate_per_mile |
+--------+---------+-----------+------------+----------------+-------+----------------+---------------+
| 1 | 101 | 501 | completed | 2023-07-01 | 450 | true | 2.2 |
| 2 | 102 | 502 | cancelled | 2023-07-03 | 0 | false | 0 |
| 3 | 103 | 501 | completed | 2023-07-04 | 600 | true | 1.9 |
+--------+---------+-----------+------------+----------------+-------+----------------+---------------+
loads
+---------+--------------+-------------------+--------------+
| load_id | origin_state | destination_state | created_date |
+---------+--------------+-------------------+--------------+
| 101 | CA | TX | 2023-06-25 |
| 102 | NV | AZ | 2023-06-26 |
| 103 | OR | WA | 2023-06-27 |
+---------+--------------+-------------------+--------------+
drivers
+-----------+-----------+-----------+-------+
| driver_id | name | hire_date | state |
+-----------+-----------+-----------+-------+
| 501 | Alice | 2022-01-10| CA |
| 502 | Bob | 2021-11-05| NV |
+-----------+-----------+-----------+-------+
##### Scenario
SQL screen on logistics data – tables jobs, loads, drivers for a trucking marketplace.
##### Question
Among the five states that appear most often in the loads table, which state has the highest count of DISTINCT completed jobs? For completed jobs, calculate the average rate_per_mile for each delivered_date (exclude rows where miles =
0). Produce a running cumulative count of completed jobs by delivered_date using a window function (ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW).
##### Hints
Join jobs ↔ loads, filter status = 'completed', be careful with DISTINCT and division by zero, and use window functions correctly.
Quick Answer: This question evaluates competency in relational data manipulation and analytical SQL concepts—including joins, aggregation with DISTINCT counts, per-date averaging, rolling cumulative counts, and handling numeric edge cases—within the Data Manipulation (SQL/Python) domain for a data scientist role.