Find top-spend categories per customer with ranking

Q: Find top-spend categories per customer with ranking

This question evaluates proficiency with SQL aggregation and JOINs, analytic/window functions (RANK()), tie and NULL/edge-case handling, and percent-of-total calculations for per-customer spend analysis.

Q: How do I approach Data Manipulation (SQL/Python) interview questions?

Data Manipulation (SQL/Python) questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master data manipulation (sql/python) interviews.

Question

Using the schema and sample data below, write a single ANSI SQL query (CTEs allowed; no temp tables) that returns, for each customer, their top 2 product categories by total spend. Output columns: customer_id, category, total_spend, earliest_order_date, rank_in_customer. Requirements:

total_spend = SUM(oi.qty * oi.unit_price) per (customer_id, category).
earliest_order_date = MIN(o.order_date) within that (customer_id, category).
Use RANK() OVER (PARTITION BY customer_id ORDER BY total_spend DESC, earliest_order_date ASC) as rank_in_customer.
Return rows where rank_in_customer <= 2, including all ties at rank 2.
Include customers with no orders as a single row: category = NULL, total_spend = 0, earliest_order_date = NULL, rank_in_customer = NULL (do not assign rank for zero-spend customers). Hint: careful LEFT JOINs and conditional ranking.
Avoid vendor-specific extensions beyond standard window functions. Follow-ups (closely related; keep as part of the same query via additional columns/CTEs): A) Add percent_of_total = total_spend / SUM(total_spend) OVER (PARTITION BY customer_id), and ensure it is 0 for customers with no orders. B) Return only customer_ids whose top category’s percent_of_total < 0.5 (i.e., their spend is not dominated by a single category).

Schema: customers(id INT PK, name TEXT) orders(id INT PK, customer_id INT FK -> customers.id, order_date DATE) order_items(id INT PK, order_id INT FK -> orders.id, product_id INT FK -> products.id, qty INT, unit_price DECIMAL(10,2)) products(id INT PK, category TEXT, name TEXT)

orders id | customer_id | order_date 101 | 1 | 2025-07-02 102 | 1 | 2025-07-15 103 | 2 | 2025-07-20 104 | 2 | 2025-07-21 105 | 3 | 2025-07-22

order_items id | order_id | product_id | qty | unit_price 1001 | 101 | 201 | 2 | 10.00 1002 | 101 | 202 | 1 | 20.00 1003 | 102 | 201 | 1 | 10.00 1004 | 102 | 203 | 5 | 5.00 1005 | 103 | 202 | 2 | 20.00 1006 | 104 | 203 | 10 | 5.00 1007 | 105 | 204 | 1 | 100.00

Find top-spend categories per customer with ranking

Overview

Comments (0)