Write Transaction Analytics SQL Queries
Company: Zoox
Role: Data Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
Quick Answer: This question evaluates proficiency in SQL for analytical queries, covering aggregation, time-based filtering, joins between transactions and vendor metadata, and the use of ranking/window functions for top-N calculations.
Part 1: Top 3 Vendors by Purchase Dollars in the Last 2 Years
Constraints
- 0 <= len(transactions) <= 200000
- Each transaction row has exactly 4 integers
- 0 <= transaction_time, current_day <= 10^9
- 0 <= transaction_dollars <= 10^9
- is_purchase is either 0 or 1
- Use the inclusive 730-day window: current_day - 730 <= transaction_time <= current_day
Examples
Input: ([[10, 900, 100, 1], [20, 850, 200, 1], [10, 880, 50, 1], [30, 100, 999, 1], [20, 920, 30, 0], [40, 950, 120, 1], [50, 960, 110, 1]], 1000)
Expected Output: [[20, 200], [10, 150], [40, 120]]
Explanation: Only purchase rows inside days 270..1000 count. Vendor 20 totals 200, vendor 10 totals 150, vendor 40 totals 120, vendor 50 totals 110.
Input: ([[1, 500, 50, 1], [2, 600, 50, 1], [3, 700, 40, 1], [4, 800, 60, 1]], 1000)
Expected Output: [[4, 60], [1, 50], [2, 50]]
Explanation: Vendors 1 and 2 tie at 50, so the smaller vendor_id comes first.
Input: ([], 1000)
Expected Output: []
Explanation: Edge case: no transactions.
Input: ([[7, 270, 30, 1], [7, 1000, 20, 1], [8, 269, 100, 1]], 1000)
Expected Output: [[7, 50]]
Explanation: Day 270 is included in the window, but day 269 is not.
Hints
- Use a hash map keyed by vendor_id to accumulate totals for qualifying purchase rows.
- After aggregation, sort vendors by descending total and ascending vendor_id, then take the first 3.
Part 2: Top 3 Vendors by Purchase Dollars for Each Location
Constraints
- 0 <= len(transactions) <= 200000
- 0 <= len(vendor_ids) == len(states) == len(countries) <= 100000
- vendor_ids are unique
- Each transaction row has exactly 4 integers
- 0 <= transaction_time, current_day <= 10^9
- 0 <= transaction_dollars <= 10^9
- is_purchase is either 0 or 1
- Use the inclusive 730-day window: current_day - 730 <= transaction_time <= current_day
Examples
Input: ([[10, 900, 100, 1], [10, 920, 50, 1], [11, 910, 200, 1], [12, 930, 200, 1], [13, 940, 80, 1], [14, 950, 60, 1], [15, 960, 70, 1], [11, 970, 20, 0]], [10, 11, 12, 13, 14, 15], ['CA', 'CA', 'CA', 'NY', 'NY', 'ON'], ['US', 'US', 'US', 'US', 'US', 'CA'], 1000)
Expected Output: [['CA', 'US', 11, 200, 1], ['CA', 'US', 12, 200, 1], ['CA', 'US', 10, 150, 2], ['NY', 'US', 13, 80, 1], ['NY', 'US', 14, 60, 2], ['ON', 'CA', 15, 70, 1]]
Explanation: In CA, US, vendors 11 and 12 tie for rank 1 with 200 each, and vendor 10 is rank 2 with 150.
Input: ([[1, 900, 100, 1], [2, 905, 90, 1], [3, 910, 90, 1], [4, 915, 80, 1], [5, 920, 70, 1]], [1, 2, 3, 4, 5], ['TX', 'TX', 'TX', 'TX', 'TX'], ['US', 'US', 'US', 'US', 'US'], 1000)
Expected Output: [['TX', 'US', 1, 100, 1], ['TX', 'US', 2, 90, 2], ['TX', 'US', 3, 90, 2], ['TX', 'US', 4, 80, 3]]
Explanation: Dense ranks are 1, 2, 2, 3, 4, so the vendor with rank 4 is excluded.
Input: ([], [1, 2], ['CA', 'NY'], ['US', 'US'], 1000)
Expected Output: []
Explanation: Edge case: no transactions.
Input: ([[21, 270, 50, 1], [22, 1000, 60, 1], [24, 900, 100, 1]], [21, 22], ['BC', 'BC'], ['CA', 'US'], 1000)
Expected Output: [['BC', 'CA', 21, 50, 1], ['BC', 'US', 22, 60, 1]]
Explanation: Day 270 is included. Vendor 24 is ignored because its location is unknown.
Hints
- First build a lookup from vendor_id to its (state_province, country).
- Aggregate totals per vendor, then group vendors by location and assign dense ranks after sorting each group by total descending.