Calculate Defect Rate and Identify Top Lanes for Carriers
Company: Amazon
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
shipment
+-------------+----------+-----------+---------+---------+-------------+-----------+
| shipment_id | order_id | ship_date | carrier | origin | destination | status |
+-------------+----------+-----------+---------+---------+-------------+-----------+
| 1001 | 5001 | 2023-09-01| UPS | SEA | NYC | Delivered |
| 1002 | 5002 | 2023-09-02| FedEx | LAX | CHI | In-Transit|
| 1003 | 5003 | 2023-09-03| USPS | DAL | ATL | Delivered |
+-------------+----------+-----------+---------+---------+-------------+-----------+
defect
+-----------+-------------+-------------+----------------------+
| defect_id | shipment_id | defect_type | defect_reported_date |
+-----------+-------------+-------------+----------------------+
| 1 | 1001 | Damaged | 2023-09-05 |
| 2 | 1002 | Lost | 2023-09-06 |
| 3 | 1002 | LateDelivery| 2023-09-07 |
+-----------+-------------+-------------+----------------------+
##### Scenario
Amazon Logistics team wants to monitor and reduce shipment defects across carriers and shipment lanes.
##### Question
Given tables shipment and defect, write SQL to calculate the defect rate (number of defects ÷ number of shipments) for each carrier in the last 30 days. Extend your query to show the week-over-week percentage change in defect rate for each carrier using window functions. Identify the top 3 origin-destination lanes with the highest count of 'Damaged' defects in the past quarter.
##### Hints
Join shipment and defect, filter by date, aggregate counts, compute rates, use LAG for WoW change.
Quick Answer: This question evaluates competency in SQL-based data manipulation for a data scientist role, covering joins, aggregations, date-based filtering, metric computation, and use of window functions to derive week-over-week changes.