Identify Key Metrics for Monitoring Shipment Defects
Company: Amazon
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
shipment
+-------------+----------+------------+---------+---------+
| shipment_id | order_id | ship_date | carrier | status |
+-------------+----------+------------+---------+---------+
| 1001 | 5001 | 2023-07-01 | UPS | shipped |
| 1002 | 5002 | 2023-07-02 | FedEx | shipped |
| 1003 | 5003 | 2023-07-03 | DHL | delayed |
+-------------+----------+------------+---------+---------+
defect
+-----------+-------------+--------------+--------------+
| defect_id | shipment_id | defect_type | detected_date|
+-----------+-------------+--------------+--------------+
| 9001 | 1001 | broken_item | 2023-07-05 |
| 9002 | 1003 | missing_part | 2023-07-06 |
| 9003 | 1003 | late_ship | 2023-07-07 |
+-----------+-------------+--------------+--------------+
##### Scenario
Amazon logistics team wants to monitor product quality by tracking defects related to customer shipments.
##### Question
Which metrics would you define to monitor shipment defects on an ongoing basis? Write the SQL that calculates, for the last 30 days, each carrier’s total shipments, total defective shipments, and defect_rate (= defective/total). Return the top-3 carriers with the highest defect_rate.
##### Hints
Join shipment and defect, filter on ship_date or detected_date, GROUP BY carrier, use COUNT(DISTINCT) or COUNT(*) as appropriate; compute defect_rate in a CTE or HAVING clause.
Quick Answer: This question evaluates a data scientist's skill in defining and computing operational monitoring metrics for shipment defects, emphasizing data manipulation, aggregation, and metric definition using SQL and Python.