How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a medium difficulty Coding & Algorithms question, commonly asked during Technical Screen rounds at Zoox.

What role is this question designed for?

This question is commonly asked for Data Engineer candidates at Zoox during technical interviews.

Write Transaction Analytics SQL Queries

Company: Zoox

Role: Data Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

You are given a credit-card transaction dataset with the following tables. `transactions` | column | type | description | |---|---|---| | `transaction_id` | integer | Unique ID for a transaction. | | `user_id` | integer | Unique ID for a customer. | | `vendor_id` | integer | Unique ID for a vendor. | | `transaction_time` | timestamp | Time when the transaction was recorded. | | `transaction_dollars` | numeric | Dollar amount of the transaction. | | `transaction_type` | text | Transaction type, such as `PURCHASE` or `REFUND`. | | `refund_transaction_id` | integer | For a `REFUND`, the `transaction_id` of the original `PURCHASE`. | `vendors` | column | type | description | |---|---|---| | `vendor_id` | integer | Unique ID for a vendor. | | `city` | text | City where the vendor is located. | | `state_province` | text | State or province where the vendor is located. | | `country` | text | Two-letter country code. | Assume that only `PURCHASE` rows count as charged dollars. Write SQL queries for the following tasks: 1. Find the top 3 vendors by total purchase dollars charged in the last 2 years. Return `vendor_id` and `total_dollars_charged`, sorted from highest to lowest total. 2. For each `(state_province, country)` location, find the top 3 vendors by total purchase dollars charged in the last 2 years. Return `state_province`, `country`, `vendor_id`, `total_dollars`, and `vendor_rank`. Use dense ranking so ties receive the same rank.

Quick Answer: This question evaluates proficiency in SQL for analytical queries, covering aggregation, time-based filtering, joins between transactions and vendor metadata, and the use of ranking/window functions for top-N calculations.

Part 1: Top 3 Vendors by Purchase Dollars in the Last 2 Years

You are given transaction rows in a simplified in-memory format. Each transaction is a 4-element list: [vendor_id, transaction_time, transaction_dollars, is_purchase]. - vendor_id: integer vendor identifier - transaction_time: integer day number - transaction_dollars: non-negative integer dollar amount - is_purchase: 1 if this row is a PURCHASE, 0 otherwise (for example, a REFUND) Only PURCHASE rows count as charged dollars. A transaction is considered to be in the last 2 years if it falls in the inclusive window [current_day - 730, current_day]. Return the top 3 vendors by total purchase dollars charged in that window. Each output row should be [vendor_id, total_dollars_charged]. Sort by total_dollars_charged descending. If two vendors have the same total, sort by vendor_id ascending.

Constraints

0 <= len(transactions) <= 200000
Each transaction row has exactly 4 integers
0 <= transaction_time, current_day <= 10^9
0 <= transaction_dollars <= 10^9
is_purchase is either 0 or 1
Use the inclusive 730-day window: current_day - 730 <= transaction_time <= current_day

Examples

Input: ([[10, 900, 100, 1], [20, 850, 200, 1], [10, 880, 50, 1], [30, 100, 999, 1], [20, 920, 30, 0], [40, 950, 120, 1], [50, 960, 110, 1]], 1000)

Expected Output: [[20, 200], [10, 150], [40, 120]]

Explanation: Only purchase rows inside days 270..1000 count. Vendor 20 totals 200, vendor 10 totals 150, vendor 40 totals 120, vendor 50 totals 110.

Input: ([[1, 500, 50, 1], [2, 600, 50, 1], [3, 700, 40, 1], [4, 800, 60, 1]], 1000)

Expected Output: [[4, 60], [1, 50], [2, 50]]

Explanation: Vendors 1 and 2 tie at 50, so the smaller vendor_id comes first.

Hints

Use a hash map keyed by vendor_id to accumulate totals for qualifying purchase rows.
After aggregation, sort vendors by descending total and ascending vendor_id, then take the first 3.

Part 2: Top 3 Vendors by Purchase Dollars for Each Location

You are given transaction rows and vendor locations. Each transaction is a 4-element list: [vendor_id, transaction_time, transaction_dollars, is_purchase]. Vendor locations are given by three parallel arrays: - vendor_ids[i] - states[i] - countries[i] Together, these mean vendor_ids[i] is located in (states[i], countries[i]). Only PURCHASE rows count as charged dollars. A transaction is in the last 2 years if it falls in the inclusive window [current_day - 730, current_day]. For each (state_province, country) location, compute each vendor's total purchase dollars in that window, then assign a dense rank within that location by total dollars descending. Return every vendor whose dense rank is 1, 2, or 3 in its location. Each output row should be: [state_province, country, vendor_id, total_dollars, vendor_rank] Rules: - Dense ranking means equal totals receive the same rank. - If rank 3 is tied, include all vendors with rank 3. - Ignore transactions whose vendor_id does not appear in vendor_ids. - Sort the final result by state_province ascending, then country ascending, then vendor_rank ascending, then vendor_id ascending.

Constraints

0 <= len(transactions) <= 200000
0 <= len(vendor_ids) == len(states) == len(countries) <= 100000
vendor_ids are unique
Each transaction row has exactly 4 integers
0 <= transaction_time, current_day <= 10^9
0 <= transaction_dollars <= 10^9
is_purchase is either 0 or 1
Use the inclusive 730-day window: current_day - 730 <= transaction_time <= current_day

Examples

Input: ([[10, 900, 100, 1], [10, 920, 50, 1], [11, 910, 200, 1], [12, 930, 200, 1], [13, 940, 80, 1], [14, 950, 60, 1], [15, 960, 70, 1], [11, 970, 20, 0]], [10, 11, 12, 13, 14, 15], ['CA', 'CA', 'CA', 'NY', 'NY', 'ON'], ['US', 'US', 'US', 'US', 'US', 'CA'], 1000)

Expected Output: [['CA', 'US', 11, 200, 1], ['CA', 'US', 12, 200, 1], ['CA', 'US', 10, 150, 2], ['NY', 'US', 13, 80, 1], ['NY', 'US', 14, 60, 2], ['ON', 'CA', 15, 70, 1]]

Explanation: In CA, US, vendors 11 and 12 tie for rank 1 with 200 each, and vendor 10 is rank 2 with 150.

Input: ([[1, 900, 100, 1], [2, 905, 90, 1], [3, 910, 90, 1], [4, 915, 80, 1], [5, 920, 70, 1]], [1, 2, 3, 4, 5], ['TX', 'TX', 'TX', 'TX', 'TX'], ['US', 'US', 'US', 'US', 'US'], 1000)

Expected Output: [['TX', 'US', 1, 100, 1], ['TX', 'US', 2, 90, 2], ['TX', 'US', 3, 90, 2], ['TX', 'US', 4, 80, 3]]

Explanation: Dense ranks are 1, 2, 2, 3, 4, so the vendor with rank 4 is excluded.

Hints

First build a lookup from vendor_id to its (state_province, country).
Aggregate totals per vendor, then group vendors by location and assign dense ranks after sorting each group by total descending.

Quick Overview

This question evaluates proficiency in SQL for analytical queries, covering aggregation, time-based filtering, joins between transactions and vendor metadata, and the use of ranking/window functions for top-N calculations.