PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Data Manipulation (SQL/Python)/Amazon

Compute join counts and window ranks

Last updated: Mar 29, 2026

Quick Overview

This question evaluates SQL data-manipulation and analytic competencies including join semantics, set operations, window functions, view versus materialized view trade-offs, key and partition design, and query-tuning techniques in the Data Manipulation (SQL/Python) domain.

  • Medium
  • Amazon
  • Data Manipulation (SQL/Python)
  • Data Scientist

Compute join counts and window ranks

Company: Amazon

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Onsite

Given the following small schema and data, answer all parts precisely and justify each count/output. Tables and rows: Customers(cust_id INT PRIMARY KEY, name TEXT) cust_id | name --------+------ 1 | Alice 2 | Bob 3 | Chen 4 | Dana Orders(order_id INT PRIMARY KEY, customer_id INT, amount INT) order_id | customer_id | amount ---------+-------------+------- 101 | 1 | 50 102 | 1 | 70 103 | 2 | 30 104 | 5 | 99 A(val INT) val ---- 1 2 2 3 B(val INT) val ---- 2 3 4 Scores(user_id INT, score INT, ts DATE) user_id | score | ts --------+-------+----------- 1 | 95 | 2025-08-30 1 | 88 | 2025-08-31 2 | 88 | 2025-08-31 3 | 75 | 2025-08-31 Tasks: 1) For SELECT * FROM Customers c JOIN Orders o ON c.cust_id = o.customer_id, provide the exact row counts for INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL OUTER JOIN, and CROSS JOIN. Briefly explain each count (e.g., duplicate-preserving joins, unmatched rows, null-extended rows). 2) For SELECT val FROM A UNION SELECT val FROM B and SELECT val FROM A UNION ALL SELECT val FROM B: give (a) the row count and (b) the final sorted contents for each query. 3) On 2025-08-31 only, compute both RANK() and DENSE_RANK() over (ORDER BY score DESC) for Scores, and list the expected (user_id, score, rank, dense_rank) rows for that date. 4) Define a VIEW and contrast it with a MATERIALIZED VIEW. Give one pro and one con of each in analytic workloads. In this schema, propose a useful view and specify its definition. 5) Define PRIMARY KEY, FOREIGN KEY, and PARTITION KEY. If Orders grows to 1B rows with an added order_date DATE, choose appropriate keys and a partitioning strategy. Name at least three DB-agnostic query-tuning steps that would speed queries filtering on order_date and joining on customer_id (e.g., predicate pushdown, covering indexes, statistics, join reordering). Then rewrite this naive query for performance on large tables and explain why your rewrite should help: Naive: SELECT * FROM Orders o JOIN Customers c ON o.customer_id = c.cust_id WHERE o.amount > 40 ORDER BY c.name;

Quick Answer: This question evaluates SQL data-manipulation and analytic competencies including join semantics, set operations, window functions, view versus materialized view trade-offs, key and partition design, and query-tuning techniques in the Data Manipulation (SQL/Python) domain.

Related Interview Questions

  • Find recommended friend pairs by shared songs - Amazon (medium)
  • Find recommended friend pairs by shared listening - Amazon (easy)
  • Write SQL window functions for D7 retention - Amazon (medium)
  • Find daily first-order merchants with SQL - Amazon (Medium)
  • Design student–course data models and SQL - Amazon (Medium)
Amazon logo
Amazon
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Data Manipulation (SQL/Python)
2
0

Given the following small schema and data, answer all parts precisely and justify each count/output.

Tables and rows: Customers(cust_id INT PRIMARY KEY, name TEXT) cust_id | name --------+------ 1 | Alice 2 | Bob 3 | Chen 4 | Dana

Orders(order_id INT PRIMARY KEY, customer_id INT, amount INT) order_id | customer_id | amount ---------+-------------+------- 101 | 1 | 50 102 | 1 | 70 103 | 2 | 30 104 | 5 | 99

A(val INT) val

1 2 2 3

B(val INT) val

2 3 4

Scores(user_id INT, score INT, ts DATE) user_id | score | ts --------+-------+----------- 1 | 95 | 2025-08-30 1 | 88 | 2025-08-31 2 | 88 | 2025-08-31 3 | 75 | 2025-08-31

Tasks:

  1. For SELECT * FROM Customers c JOIN Orders o ON c.cust_id = o.customer_id, provide the exact row counts for INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL OUTER JOIN, and CROSS JOIN. Briefly explain each count (e.g., duplicate-preserving joins, unmatched rows, null-extended rows).
  2. For SELECT val FROM A UNION SELECT val FROM B and SELECT val FROM A UNION ALL SELECT val FROM B: give (a) the row count and (b) the final sorted contents for each query.
  3. On 2025-08-31 only, compute both RANK() and DENSE_RANK() over (ORDER BY score DESC) for Scores, and list the expected (user_id, score, rank, dense_rank) rows for that date.
  4. Define a VIEW and contrast it with a MATERIALIZED VIEW. Give one pro and one con of each in analytic workloads. In this schema, propose a useful view and specify its definition.
  5. Define PRIMARY KEY, FOREIGN KEY, and PARTITION KEY. If Orders grows to 1B rows with an added order_date DATE, choose appropriate keys and a partitioning strategy. Name at least three DB-agnostic query-tuning steps that would speed queries filtering on order_date and joining on customer_id (e.g., predicate pushdown, covering indexes, statistics, join reordering). Then rewrite this naive query for performance on large tables and explain why your rewrite should help: Naive: SELECT * FROM Orders o JOIN Customers c ON o.customer_id = c.cust_id WHERE o.amount > 40 ORDER BY c.name;

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Data Manipulation (SQL/Python)•More Amazon•More Data Scientist•Amazon Data Scientist•Amazon Data Manipulation (SQL/Python)•Data Scientist Data Manipulation (SQL/Python)
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.