Design SQL/Pandas aggregations on retail schema

Q: How do I practice SQL interview questions?

PracHub provides an interactive SQL console where you can write and test queries against real database schemas. Get instant feedback and compare your solution with the expected output.

Q: What difficulty level is this coding question?

This is a medium difficulty Data Manipulation (SQL/Python) question, commonly asked during Technical Screen rounds at Amazon.

Q: What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Amazon during technical interviews.

Question

Using the schema and sample data below, answer both parts. Assume today is 2025-09-01. Use standard SQL (e.g., PostgreSQL) and idiomatic pandas without Python for-loops over rows.

Schema and sample rows
customers
+---------+------+------------+
| cust_id | name | signup_date|
+---------+------+------------+
| C1 | Ada | 2025-08-28 |
| C2 | Ben | 2025-08-29 |
| C3 | Cy | 2025-08-25 |
+---------+------+------------+

orders
+----------+---------+------------+-----------+
| order_id | cust_id | order_date | status |
+----------+---------+------------+-----------+
| 101 | C1 | 2025-08-29 | completed |
| 102 | C1 | 2025-08-30 | completed |
| 107 | C1 | 2025-08-30 | completed |
| 103 | C1 | 2025-09-01 | returned |
| 104 | C2 | 2025-08-31 | completed |
| 105 | C2 | 2025-09-01 | completed |
| 106 | C3 | 2025-08-25 | cancelled |
+----------+---------+------------+-----------+

order_items
+----------+------------+-----+--------+
| order_id | product_id | qty | price |
+----------+------------+-----+--------+
| 101 | P1 | 1 | 10.00 |
| 101 | P2 | 2 | 5.00 |
| 102 | P1 | 1 | 10.00 |
| 107 | P2 | 1 | 5.00 |
| 104 | P3 | 1 | 20.00 |
| 105 | P3 | 2 | 20.00 |
+----------+------------+-----+--------+

products
+------------+----------+
| product_id | category |
+------------+----------+
| P1 | A |
| P2 | B |
| P3 | A |
+------------+----------+

Definitions
- Order revenue = sum(qty * price) over items for that order.
- Only orders with status = completed count toward revenue; returned or cancelled do not.
- If a customer has multiple completed orders on the same calendar day, keep only one order for that day: pick the order with the greater order revenue; if there is a tie on revenue, keep the higher order_id.

Part A (SQL)
Write a single SQL query that returns, for each customer with at least two completed orders in the last 7 days (inclusive of today = 2025-09-01), one row per kept order (after the same-day de-duplication rule) with the following columns:
- cust_id
- order_id (after applying the same-day selection rule)
- order_date
- order_revenue
- rolling_3_day_rev: sum of the customer's order_revenue over the window [order_date - 2 days, order_date], considering only completed orders that survived the same-day selection rule
- rank_in_7d_by_revenue: dense rank of this kept order's revenue among the customer's kept orders in the last 7 days, highest revenue gets rank 1
Requirements: implement same-day selection without correlated subqueries; use window functions; do not use temporary tables.

Part B (pandas)
Using pandas DataFrames with the same content, produce a DataFrame with one row per customer for the last 7 days (inclusive, relative to today = 2025-09-01) containing:
- cust_id
- total_7d_revenue
- top_category_7d: the product category with the highest revenue for that customer in the 7-day window (break ties by alphabetical order of category)
- top_category_share_7d: the fraction (0–1) of the customer's 7-day revenue attributable to top_category_7d
Constraints: avoid Python loops; show how you enforce the same-day order selection rule prior to aggregation.

PracHub · Accepted Answer

This question evaluates ability to perform time-based aggregations and data manipulation using SQL and pandas, including per-order revenue calculation, same-day de-duplication, rolling-window sums, and intra-customer ranking.

Quick Overview

Quick Overview