Write SQL using joins and window functions
Company: Capital One
Role: Data Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
Quick Answer: This question evaluates proficiency in SQL joins, aggregation, and window functions for ranked group-level analytics, specifically testing the ability to compute partitioned ranks, handle ties, and apply time-based filters in a data engineering context.
Constraints
- 0 <= len(accounts) <= 100000
- 0 <= len(transactions) <= 100000
- account_id values in accounts are unique
- txn_ts and current_ts are ISO timestamp strings without time zones
- amount is numeric and may be positive, zero, or negative
- Only customers with at least one qualifying transaction in the last 30 days are returned
Examples
Input: ([{'account_id': 'a1', 'customer_id': 'c1', 'segment': 'prime'}, {'account_id': 'a2', 'customer_id': 'c1', 'segment': 'prime'}, {'account_id': 'a3', 'customer_id': 'c2', 'segment': 'prime'}, {'account_id': 'a4', 'customer_id': 'c3', 'segment': 'prime'}, {'account_id': 'a5', 'customer_id': 'c4', 'segment': 'prime'}, {'account_id': 'a6', 'customer_id': 'c5', 'segment': 'subprime'}], [{'transaction_id': 't1', 'account_id': 'a1', 'amount': 100, 'txn_ts': '2025-01-10 00:00:00'}, {'transaction_id': 't2', 'account_id': 'a2', 'amount': 50, 'txn_ts': '2025-01-20 00:00:00'}, {'transaction_id': 't3', 'account_id': 'a3', 'amount': 200, 'txn_ts': '2025-01-15 00:00:00'}, {'transaction_id': 't4', 'account_id': 'a4', 'amount': 75, 'txn_ts': '2025-01-30 00:00:00'}, {'transaction_id': 't5', 'account_id': 'a4', 'amount': 25, 'txn_ts': '2024-12-31 23:59:59'}, {'transaction_id': 't6', 'account_id': 'a5', 'amount': 75, 'txn_ts': '2025-01-05 00:00:00'}, {'transaction_id': 't7', 'account_id': 'a6', 'amount': 300, 'txn_ts': '2025-01-02 00:00:00'}], '2025-01-31 00:00:00')
Expected Output: [('prime', 'c2', 200, 1), ('prime', 'c1', 150, 2), ('prime', 'c3', 75, 3), ('prime', 'c4', 75, 3), ('subprime', 'c5', 300, 1)]
Explanation: Customer c1 has two prime accounts totaling 150. Customers c3 and c4 tie at rank 3 in prime, so both are included. The old transaction on 2024-12-31 is outside the 30-day window.
Input: ([{'account_id': 'a1', 'customer_id': 'c1', 'segment': 'prime'}, {'account_id': 'a2', 'customer_id': 'c2', 'segment': 'prime'}, {'account_id': 'a3', 'customer_id': 'c3', 'segment': 'prime'}, {'account_id': 'a4', 'customer_id': 'c4', 'segment': 'prime'}], [{'transaction_id': 't1', 'account_id': 'a1', 'amount': 10, 'txn_ts': '2025-01-02 12:00:00'}, {'transaction_id': 't2', 'account_id': 'a2', 'amount': 20, 'txn_ts': '2025-02-01 12:00:00'}, {'transaction_id': 't3', 'account_id': 'a3', 'amount': 100, 'txn_ts': '2025-01-02 11:59:59'}, {'transaction_id': 't4', 'account_id': 'a4', 'amount': 1000, 'txn_ts': '2025-02-01 12:00:01'}, {'transaction_id': 't5', 'account_id': 'missing', 'amount': 999, 'txn_ts': '2025-02-01 12:00:00'}], '2025-02-01 12:00:00')
Expected Output: [('prime', 'c2', 20, 1), ('prime', 'c1', 10, 2)]
Explanation: The transaction exactly 30 days before current_ts and the transaction exactly at current_ts are included. The older transaction, future transaction, and transaction with no matching account are ignored.
Input: ([{'account_id': 'a1', 'customer_id': 'c1', 'segment': 'prime'}], [{'transaction_id': 't1', 'account_id': 'a1', 'amount': 50, 'txn_ts': '2024-12-01 00:00:00'}], '2025-01-31 00:00:00')
Expected Output: []
Explanation: There are no transactions within the inclusive 30-day window, so no customers are returned.
Input: ([{'account_id': 'a1', 'customer_id': 'c1', 'segment': 'prime'}, {'account_id': 'a2', 'customer_id': 'c2', 'segment': 'prime'}, {'account_id': 'a3', 'customer_id': 'c3', 'segment': 'prime'}, {'account_id': 'a4', 'customer_id': 'c4', 'segment': 'prime'}], [{'transaction_id': 't1', 'account_id': 'a1', 'amount': 100, 'txn_ts': '2025-01-20 00:00:00'}, {'transaction_id': 't2', 'account_id': 'a2', 'amount': 100, 'txn_ts': '2025-01-20 00:00:00'}, {'transaction_id': 't3', 'account_id': 'a3', 'amount': 50, 'txn_ts': '2025-01-20 00:00:00'}, {'transaction_id': 't4', 'account_id': 'a4', 'amount': 40, 'txn_ts': '2025-01-20 00:00:00'}], '2025-01-31 00:00:00')
Expected Output: [('prime', 'c1', 100, 1), ('prime', 'c2', 100, 1), ('prime', 'c3', 50, 3)]
Explanation: Customers c1 and c2 tie for rank 1. SQL RANK skips rank 2, so c3 has rank 3 and c4 has rank 4. Only ranks at most 3 are returned.
Hints
- Build a mapping from account_id to its customer_id and segment before processing transactions.
- After aggregating totals by (segment, customer_id), sort each segment by total descending and assign SQL-style RANK values.