Create SQL Queries for Sales and Customer Analysis
Company: Amazon
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
SALES
+----------+----------+------------+------------+----------+---------+
| sale_id | store_id | product_id | sale_date | quantity | revenue |
+----------+----------+------------+------------+----------+---------+
| 1 | 101 | 501 | 2024-06-01 | 3 | 75.00 |
| 2 | 102 | 502 | 2024-06-02 | NULL | 40.00 |
| 3 | 101 | 501 | 2024-06-03 | 2 | 50.00 |
+----------+----------+------------+------------+----------+---------+
ORDERS
+----------+-------------+------------+--------------+
| order_id | customer_id | order_date | total_amount |
+----------+-------------+------------+--------------+
| 1001 | 2001 | 2024-06-01 | 120.00 |
| 1002 | 2002 | 2024-06-02 | 75.00 |
| 1003 | 2001 | 2024-06-03 | 50.00 |
+----------+-------------+------------+--------------+
ORDER_ITEMS
+----------+------------+----------+---------+
| order_id | product_id | quantity | revenue |
+----------+------------+----------+---------+
| 1001 | 501 | 2 | 50.00 |
| 1001 | 502 | 3 | 70.00 |
| 1002 | 503 | 1 | 75.00 |
+----------+------------+----------+---------+
CUSTOMERS
+-------------+--------+
| customer_id | region |
+-------------+--------+
| 2001 | West |
| 2002 | East |
+-------------+--------+
##### Scenario
Retail analytics team needs ad-hoc reports on sales performance and customer behavior.
##### Question
a) Given a single table SALES, write SQL to: • compute the average and standard deviation of revenue by product_id; • create a column that maps NULL quantity to 0 using CASE WHEN. b) Given three tables ORDERS, ORDER_ITEMS, CUSTOMERS, write SQL to: • join them to get total revenue per customer; • rank and dense-rank customers by total revenue within each region.
##### Hints
Focus on GROUP BY, JOIN conditions, COALESCE/CASE, aggregate functions, and window functions (RANK/DENSE_RANK).
Quick Answer: This question evaluates proficiency in SQL-based data manipulation, covering aggregation and dispersion metrics, null handling, relational joins, and window functions for ranking customers.