PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Data Manipulation (SQL/Python)/Amazon

Verify subscriptions and analyze orders with SQL/Python

Last updated: Mar 29, 2026

Quick Overview

This question evaluates proficiency in SQL and pandas data manipulation, covering data-quality validation, temporal sequence reasoning, deduplication, deterministic aggregation, and tie-breaking logic.

  • Medium
  • Amazon
  • Data Manipulation (SQL/Python)
  • Data Scientist

Verify subscriptions and analyze orders with SQL/Python

Company: Amazon

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Technical Screen

You are given two tables. Write SQL and Python (pandas) to answer the sub-questions precisely, handling edge cases, ties, and missing data. Schema - subscriptions(subscription_id INT, status VARCHAR(20), status_date DATE) - orders(order_id INT, cust_id INT, order_date DATE, product VARCHAR(30), order_amount DECIMAL(10,2), shipping_option_id INT) Sample data (small, intentionally messy) subscriptions subscription_id | status | status_date --------------: | -------- | ----------- 1001 | inactive | 2025-01-10 1001 | active | 2025-02-01 1001 | paused | 2025-03-15 1001 | inactive | 2025-04-01 1002 | active | 2025-01-05 1002 | active | 2025-01-05 1003 | active | 2025-02-10 1003 | inactive | 2025-02-08 orders order_id | cust_id | order_date | product | order_amount | shipping_option_id -------: | ------: | ---------- | ------- | -----------: | -----------------: 1 | 10 | 2025-01-05 | camera | 80.00 | 1 2 | 10 | 2025-01-20 | shoes | 30.00 | 2 3 | 11 | 2025-01-25 | laptop | 1200.00 | 1 4 | 12 | 2025-02-02 | clothes | 40.00 | 3 5 | 12 | 2025-02-10 | camera | 70.00 | 1 6 | 13 | 2025-02-15 | shoes | 20.00 | 2 7 | 13 | 2025-02-16 | shoes | 25.00 | 2 8 | 14 | 2025-03-01 | laptop | 999.00 | 1 9 | 10 | 2025-03-05 | clothes | 15.00 | 3 10 | 15 | 2025-03-08 | camera | 50.00 | 1 Tasks A) SQL data-quality checks on subscriptions - Write one or more queries that would rigorously confirm or refute all of the following assumptions about the subscriptions table, returning concrete violating rows if any exist: 1) Allowed statuses are only {'active','inactive'}; surface any unexpected statuses (e.g., 'paused'). 2) (subscription_id, status_date) is unique; list duplicates if present. 3) For each subscription_id, status_date values are strictly increasing over time; surface any non-monotonic back-dated rows. 4) For each subscription_id and calendar date, there is at most one status; surface overlapping same-day multi-status cases. - Additionally, as of reference_date = '2025-09-01', return for each subscription_id: its latest known status and the timestamp of that status. B) Pandas on subscriptions - Create a pandas DataFrame with columns [subscription_id, first_active_date, last_inactive_date], where: • first_active_date is the earliest status_date with status = 'active'. • last_inactive_date is the most recent status_date with status = 'inactive' on or before '2025-09-01'. - Requirements: handle ties/duplicates deterministically (pick MIN for first_active_date, MAX for last_inactive_date), ignore unexpected statuses, and allow nulls when a subscription never became active or inactive. C) Python (pandas) on orders 1) Return the list of cust_id who either (a) placed fewer than 2 total orders, or (b) have total order_amount across all time < 100.00. 2) For each calendar month (YYYY-MM based on order_date), compute two leaderboards: • Top 5 customers by order count. • Top 5 customers by total order_amount. Use tie-breakers: higher total_amount first, then lower cust_id; if fewer than 5 exist, return all available. 3) Compute each customer's total spend per product type and present a wide table with columns exactly: cust_id | camera | shoes | laptop | clothes. Missing combinations should be filled with 0.00.

Quick Answer: This question evaluates proficiency in SQL and pandas data manipulation, covering data-quality validation, temporal sequence reasoning, deduplication, deterministic aggregation, and tie-breaking logic.

Related Interview Questions

  • Find recommended friend pairs by shared songs - Amazon (medium)
  • Find recommended friend pairs by shared listening - Amazon (easy)
  • Write SQL window functions for D7 retention - Amazon (medium)
  • Find daily first-order merchants with SQL - Amazon (Medium)
  • Design student–course data models and SQL - Amazon (Medium)
Amazon logo
Amazon
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Data Manipulation (SQL/Python)
5
0

You are given two tables. Write SQL and Python (pandas) to answer the sub-questions precisely, handling edge cases, ties, and missing data.

Schema

  • subscriptions(subscription_id INT, status VARCHAR(20), status_date DATE)
  • orders(order_id INT, cust_id INT, order_date DATE, product VARCHAR(30), order_amount DECIMAL(10,2), shipping_option_id INT)

Sample data (small, intentionally messy) subscriptions

subscription_idstatusstatus_date
1001inactive2025-01-10
1001active2025-02-01
1001paused2025-03-15
1001inactive2025-04-01
1002active2025-01-05
1002active2025-01-05
1003active2025-02-10
1003inactive2025-02-08

orders

order_idcust_idorder_dateproductorder_amountshipping_option_id
1102025-01-05camera80.001
2102025-01-20shoes30.002
3112025-01-25laptop1200.001
4122025-02-02clothes40.003
5122025-02-10camera70.001
6132025-02-15shoes20.002
7132025-02-16shoes25.002
8142025-03-01laptop999.001
9102025-03-05clothes15.003
10152025-03-08camera50.001

Tasks A) SQL data-quality checks on subscriptions

  • Write one or more queries that would rigorously confirm or refute all of the following assumptions about the subscriptions table, returning concrete violating rows if any exist:
  1. Allowed statuses are only {'active','inactive'}; surface any unexpected statuses (e.g., 'paused').
  2. (subscription_id, status_date) is unique; list duplicates if present.
  3. For each subscription_id, status_date values are strictly increasing over time; surface any non-monotonic back-dated rows.
  4. For each subscription_id and calendar date, there is at most one status; surface overlapping same-day multi-status cases.
  • Additionally, as of reference_date = '2025-09-01', return for each subscription_id: its latest known status and the timestamp of that status.

B) Pandas on subscriptions

  • Create a pandas DataFrame with columns [subscription_id, first_active_date, last_inactive_date], where: • first_active_date is the earliest status_date with status = 'active'. • last_inactive_date is the most recent status_date with status = 'inactive' on or before '2025-09-01'.
  • Requirements: handle ties/duplicates deterministically (pick MIN for first_active_date, MAX for last_inactive_date), ignore unexpected statuses, and allow nulls when a subscription never became active or inactive.

C) Python (pandas) on orders

  1. Return the list of cust_id who either (a) placed fewer than 2 total orders, or (b) have total order_amount across all time < 100.00.
  2. For each calendar month (YYYY-MM based on order_date), compute two leaderboards: • Top 5 customers by order count. • Top 5 customers by total order_amount. Use tie-breakers: higher total_amount first, then lower cust_id; if fewer than 5 exist, return all available.
  3. Compute each customer's total spend per product type and present a wide table with columns exactly: cust_id | camera | shoes | laptop | clothes. Missing combinations should be filled with 0.00.

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Data Manipulation (SQL/Python)•More Amazon•More Data Scientist•Amazon Data Scientist•Amazon Data Manipulation (SQL/Python)•Data Scientist Data Manipulation (SQL/Python)
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.