Find k customers with least revenue
Company: Databricks
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
## Problem
You are given a list of purchase events. Each event contains:
- `customer_id` (string or int)
- `amount` (integer, may be 0; assume non-negative unless stated otherwise)
Define a customer's **revenue** as the sum of `amount` across all events for that customer.
### Task
Return the **k customers with the smallest total revenue**.
### Requirements
- If fewer than `k` customers exist, return all customers.
- If multiple customers tie on total revenue, you may return them in any order (or specify a deterministic tie-breaker such as `customer_id` ascending).
### Input/Output (suggested)
- **Input:** `events: List[(customer_id, amount)]`, `k: int`
- **Output:** `List[customer_id]` (the `k` least-revenue customers)
### Constraints (assumptions)
- `n = len(events)` can be large.
- Number of unique customers `m` can be up to `n`.
Quick Answer: This question evaluates a candidate's ability to aggregate per-customer revenue from event streams and perform selection on aggregated totals, testing knowledge of data aggregation, selection algorithms, and efficient use of data structures.
You are given a list of purchase events. Each event is a tuple `(customer_id, amount)`, where `amount` is a non-negative integer. A customer's revenue is the sum of all amounts for that customer across all events. Return the `k` customer IDs with the smallest total revenue, ordered by revenue ascending. If two customers have the same total revenue, return the smaller `customer_id` first. If fewer than `k` unique customers exist, return all of them. If `k` is 0, return an empty list. Within a single input, all `customer_id` values are of the same type, either all integers or all strings.
Constraints
- 0 <= len(events) <= 2 * 10^5
- 0 <= amount <= 10^9
- 0 <= k <= 2 * 10^5
- Within one test case, all `customer_id` values are comparable because they are all ints or all strings
Examples
Input: ([('A', 30), ('B', 10), ('A', 20), ('C', 5)], 2)
Expected Output: ['C', 'B']
Explanation: Total revenues are A=50, B=10, C=5. The two smallest are C and B.
Input: ([('alice', 0), ('bob', 0), ('alice', 3), ('bob', 3), ('cara', 2)], 2)
Expected Output: ['cara', 'alice']
Explanation: Total revenues are alice=3, bob=3, cara=2. Cara is smallest, and between alice and bob the tie is broken by customer_id ascending.