Compute join counts and window ranks

Q: Compute join counts and window ranks

This question evaluates SQL data-manipulation and analytic competencies including join semantics, set operations, window functions, view versus materialized view trade-offs, key and partition design, and query-tuning techniques in the Data Manipulation (SQL/Python) domain.

Q: How do I approach Data Manipulation (SQL/Python) interview questions?

Data Manipulation (SQL/Python) questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master data manipulation (sql/python) interviews.

Question

Given the following small schema and data, answer all parts precisely and justify each count/output.

Orders(order_id INT PRIMARY KEY, customer_id INT, amount INT) order_id | customer_id | amount ---------+-------------+------- 101 | 1 | 50 102 | 1 | 70 103 | 2 | 30 104 | 5 | 99

A(val INT) val

1 2 2 3

B(val INT) val

2 3 4

Scores(user_id INT, score INT, ts DATE) user_id | score | ts --------+-------+----------- 1 | 95 | 2025-08-30 1 | 88 | 2025-08-31 2 | 88 | 2025-08-31 3 | 75 | 2025-08-31

Tasks:

For SELECT * FROM Customers c JOIN Orders o ON c.cust_id = o.customer_id, provide the exact row counts for INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL OUTER JOIN, and CROSS JOIN. Briefly explain each count (e.g., duplicate-preserving joins, unmatched rows, null-extended rows).
For SELECT val FROM A UNION SELECT val FROM B and SELECT val FROM A UNION ALL SELECT val FROM B: give (a) the row count and (b) the final sorted contents for each query.
On 2025-08-31 only, compute both RANK() and DENSE_RANK() over (ORDER BY score DESC) for Scores, and list the expected (user_id, score, rank, dense_rank) rows for that date.
Define a VIEW and contrast it with a MATERIALIZED VIEW. Give one pro and one con of each in analytic workloads. In this schema, propose a useful view and specify its definition.
Define PRIMARY KEY, FOREIGN KEY, and PARTITION KEY. If Orders grows to 1B rows with an added order_date DATE, choose appropriate keys and a partitioning strategy. Name at least three DB-agnostic query-tuning steps that would speed queries filtering on order_date and joining on customer_id (e.g., predicate pushdown, covering indexes, statistics, join reordering). Then rewrite this naive query for performance on large tables and explain why your rewrite should help: Naive: SELECT * FROM Orders o JOIN Customers c ON o.customer_id = c.cust_id WHERE o.amount > 40 ORDER BY c.name;

Compute join counts and window ranks

Overview

A(val INT) val

B(val INT) val

Comments (0)