Given the following small schema and data, answer all parts precisely and justify each count/output.
Tables and rows:
Customers(cust_id INT PRIMARY KEY, name TEXT)
cust_id | name
--------+------
1 | Alice
2 | Bob
3 | Chen
4 | Dana
Orders(order_id INT PRIMARY KEY, customer_id INT, amount INT)
order_id | customer_id | amount
---------+-------------+-------
101 | 1 | 50
102 | 1 | 70
103 | 2 | 30
104 | 5 | 99
A(val INT)
val
1
2
2
3
B(val INT)
val
2
3
4
Scores(user_id INT, score INT, ts DATE)
user_id | score | ts
--------+-------+-----------
1 | 95 | 2025-08-30
1 | 88 | 2025-08-31
2 | 88 | 2025-08-31
3 | 75 | 2025-08-31
Tasks:
-
For SELECT * FROM Customers c JOIN Orders o ON c.cust_id = o.customer_id, provide the exact row counts for INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL OUTER JOIN, and CROSS JOIN. Briefly explain each count (e.g., duplicate-preserving joins, unmatched rows, null-extended rows).
-
For SELECT val FROM A UNION SELECT val FROM B and SELECT val FROM A UNION ALL SELECT val FROM B: give (a) the row count and (b) the final sorted contents for each query.
-
On 2025-08-31 only, compute both RANK() and DENSE_RANK() over (ORDER BY score DESC) for Scores, and list the expected (user_id, score, rank, dense_rank) rows for that date.
-
Define a VIEW and contrast it with a MATERIALIZED VIEW. Give one pro and one con of each in analytic workloads. In this schema, propose a useful view and specify its definition.
-
Define PRIMARY KEY, FOREIGN KEY, and PARTITION KEY. If Orders grows to 1B rows with an added order_date DATE, choose appropriate keys and a partitioning strategy. Name at least three DB-agnostic query-tuning steps that would speed queries filtering on order_date and joining on customer_id (e.g., predicate pushdown, covering indexes, statistics, join reordering). Then rewrite this naive query for performance on large tables and explain why your rewrite should help:
Naive: SELECT * FROM Orders o JOIN Customers c ON o.customer_id = c.cust_id WHERE o.amount > 40 ORDER BY c.name;