Analyze Top 10 Items' Revenue Contribution by Category
Company: Amazon
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Onsite
sales
+----------+------------+---------+---------+------------+
| order_id | category | item_id | revenue | order_date |
+----------+------------+---------+---------+------------+
| 1001 | Books | B12 | 19.99 | 2023-07-01 |
| 1002 | Books | B45 | 9.99 | 2023-07-02 |
| 1003 | Toys | T88 | 29.99 | 2023-07-02 |
| 1004 | Toys | T12 | 15.99 | 2023-07-03 |
| 1005 | Electronics| E33 |199.99 | 2023-07-03 |
+----------+------------+---------+---------+------------+
##### Scenario
SQL live-coding interview on product-sales dataset.
##### Question
Write a SQL query to return the top 10 items by total revenue within each product category. For every category, compute the percentage of category revenue that those top-10 items contribute. Explain the difference between normalization and denormalization and give scenarios for each. Outline the key steps of an ETL pipeline you would build for this dataset.
##### Hints
Think CTEs, ROW_NUMBER(), SUM() OVER, two-level aggregation.
Quick Answer: This question evaluates proficiency in data manipulation and analytical querying—aggregation, window functions, ranking and revenue attribution—alongside conceptual knowledge of normalization versus denormalization and ETL pipeline design.