PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Data Manipulation (SQL/Python)/Amazon

Find top-spend categories per customer with ranking

Last updated: Mar 29, 2026

Quick Overview

This question evaluates proficiency with SQL aggregation and JOINs, analytic/window functions (RANK()), tie and NULL/edge-case handling, and percent-of-total calculations for per-customer spend analysis.

  • Medium
  • Amazon
  • Data Manipulation (SQL/Python)
  • Data Scientist

Find top-spend categories per customer with ranking

Company: Amazon

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Onsite

Using the schema and sample data below, write a single ANSI SQL query (CTEs allowed; no temp tables) that returns, for each customer, their top 2 product categories by total spend. Output columns: customer_id, category, total_spend, earliest_order_date, rank_in_customer. Requirements: - total_spend = SUM(oi.qty * oi.unit_price) per (customer_id, category). - earliest_order_date = MIN(o.order_date) within that (customer_id, category). - Use RANK() OVER (PARTITION BY customer_id ORDER BY total_spend DESC, earliest_order_date ASC) as rank_in_customer. - Return rows where rank_in_customer <= 2, including all ties at rank 2. - Include customers with no orders as a single row: category = NULL, total_spend = 0, earliest_order_date = NULL, rank_in_customer = NULL (do not assign rank for zero-spend customers). Hint: careful LEFT JOINs and conditional ranking. - Avoid vendor-specific extensions beyond standard window functions. Follow-ups (closely related; keep as part of the same query via additional columns/CTEs): A) Add percent_of_total = total_spend / SUM(total_spend) OVER (PARTITION BY customer_id), and ensure it is 0 for customers with no orders. B) Return only customer_ids whose top category’s percent_of_total < 0.5 (i.e., their spend is not dominated by a single category). Schema: customers(id INT PK, name TEXT) orders(id INT PK, customer_id INT FK -> customers.id, order_date DATE) order_items(id INT PK, order_id INT FK -> orders.id, product_id INT FK -> products.id, qty INT, unit_price DECIMAL(10,2)) products(id INT PK, category TEXT, name TEXT) Sample data: customers id | name 1 | Alice 2 | Bob 3 | Cara 4 | Dana orders id | customer_id | order_date 101 | 1 | 2025-07-02 102 | 1 | 2025-07-15 103 | 2 | 2025-07-20 104 | 2 | 2025-07-21 105 | 3 | 2025-07-22 order_items id | order_id | product_id | qty | unit_price 1001 | 101 | 201 | 2 | 10.00 1002 | 101 | 202 | 1 | 20.00 1003 | 102 | 201 | 1 | 10.00 1004 | 102 | 203 | 5 | 5.00 1005 | 103 | 202 | 2 | 20.00 1006 | 104 | 203 | 10 | 5.00 1007 | 105 | 204 | 1 | 100.00 products id | category | name 201 | CatA | A 202 | CatA | B 203 | CatB | C 204 | CatC | D

Quick Answer: This question evaluates proficiency with SQL aggregation and JOINs, analytic/window functions (RANK()), tie and NULL/edge-case handling, and percent-of-total calculations for per-customer spend analysis.

Related Interview Questions

  • Find recommended friend pairs by shared songs - Amazon (medium)
  • Find recommended friend pairs by shared listening - Amazon (easy)
  • Write SQL window functions for D7 retention - Amazon (medium)
  • Find daily first-order merchants with SQL - Amazon (Medium)
  • Design student–course data models and SQL - Amazon (Medium)
Amazon logo
Amazon
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Data Manipulation (SQL/Python)
1
0

Using the schema and sample data below, write a single ANSI SQL query (CTEs allowed; no temp tables) that returns, for each customer, their top 2 product categories by total spend. Output columns: customer_id, category, total_spend, earliest_order_date, rank_in_customer. Requirements:

  • total_spend = SUM(oi.qty * oi.unit_price) per (customer_id, category).
  • earliest_order_date = MIN(o.order_date) within that (customer_id, category).
  • Use RANK() OVER (PARTITION BY customer_id ORDER BY total_spend DESC, earliest_order_date ASC) as rank_in_customer.
  • Return rows where rank_in_customer <= 2, including all ties at rank 2.
  • Include customers with no orders as a single row: category = NULL, total_spend = 0, earliest_order_date = NULL, rank_in_customer = NULL (do not assign rank for zero-spend customers). Hint: careful LEFT JOINs and conditional ranking.
  • Avoid vendor-specific extensions beyond standard window functions. Follow-ups (closely related; keep as part of the same query via additional columns/CTEs): A) Add percent_of_total = total_spend / SUM(total_spend) OVER (PARTITION BY customer_id), and ensure it is 0 for customers with no orders. B) Return only customer_ids whose top category’s percent_of_total < 0.5 (i.e., their spend is not dominated by a single category).

Schema: customers(id INT PK, name TEXT) orders(id INT PK, customer_id INT FK -> customers.id, order_date DATE) order_items(id INT PK, order_id INT FK -> orders.id, product_id INT FK -> products.id, qty INT, unit_price DECIMAL(10,2)) products(id INT PK, category TEXT, name TEXT)

Sample data: customers id | name 1 | Alice 2 | Bob 3 | Cara 4 | Dana

orders id | customer_id | order_date 101 | 1 | 2025-07-02 102 | 1 | 2025-07-15 103 | 2 | 2025-07-20 104 | 2 | 2025-07-21 105 | 3 | 2025-07-22

order_items id | order_id | product_id | qty | unit_price 1001 | 101 | 201 | 2 | 10.00 1002 | 101 | 202 | 1 | 20.00 1003 | 102 | 201 | 1 | 10.00 1004 | 102 | 203 | 5 | 5.00 1005 | 103 | 202 | 2 | 20.00 1006 | 104 | 203 | 10 | 5.00 1007 | 105 | 204 | 1 | 100.00

products id | category | name 201 | CatA | A 202 | CatA | B 203 | CatB | C 204 | CatC | D

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Data Manipulation (SQL/Python)•More Amazon•More Data Scientist•Amazon Data Scientist•Amazon Data Manipulation (SQL/Python)•Data Scientist Data Manipulation (SQL/Python)
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.