PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches

Quick Overview

This question evaluates a candidate's ability to aggregate per-customer revenue from event streams and perform selection on aggregated totals, testing knowledge of data aggregation, selection algorithms, and efficient use of data structures.

  • medium
  • Databricks
  • Coding & Algorithms
  • Software Engineer

Find k customers with least revenue

Company: Databricks

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

## Problem You are given a list of purchase events. Each event contains: - `customer_id` (string or int) - `amount` (integer, may be 0; assume non-negative unless stated otherwise) Define a customer's **revenue** as the sum of `amount` across all events for that customer. ### Task Return the **k customers with the smallest total revenue**. ### Requirements - If fewer than `k` customers exist, return all customers. - If multiple customers tie on total revenue, you may return them in any order (or specify a deterministic tie-breaker such as `customer_id` ascending). ### Input/Output (suggested) - **Input:** `events: List[(customer_id, amount)]`, `k: int` - **Output:** `List[customer_id]` (the `k` least-revenue customers) ### Constraints (assumptions) - `n = len(events)` can be large. - Number of unique customers `m` can be up to `n`.

Quick Answer: This question evaluates a candidate's ability to aggregate per-customer revenue from event streams and perform selection on aggregated totals, testing knowledge of data aggregation, selection algorithms, and efficient use of data structures.

You are given a list of purchase events. Each event is a tuple `(customer_id, amount)`, where `amount` is a non-negative integer. A customer's revenue is the sum of all amounts for that customer across all events. Return the `k` customer IDs with the smallest total revenue, ordered by revenue ascending. If two customers have the same total revenue, return the smaller `customer_id` first. If fewer than `k` unique customers exist, return all of them. If `k` is 0, return an empty list. Within a single input, all `customer_id` values are of the same type, either all integers or all strings.

Constraints

  • 0 <= len(events) <= 2 * 10^5
  • 0 <= amount <= 10^9
  • 0 <= k <= 2 * 10^5
  • Within one test case, all `customer_id` values are comparable because they are all ints or all strings

Examples

Input: ([('A', 30), ('B', 10), ('A', 20), ('C', 5)], 2)

Expected Output: ['C', 'B']

Explanation: Total revenues are A=50, B=10, C=5. The two smallest are C and B.

Input: ([('alice', 0), ('bob', 0), ('alice', 3), ('bob', 3), ('cara', 2)], 2)

Expected Output: ['cara', 'alice']

Explanation: Total revenues are alice=3, bob=3, cara=2. Cara is smallest, and between alice and bob the tie is broken by customer_id ascending.

Input: ([(1, 4), (2, 1), (1, 1), (3, 0)], 5)

Expected Output: [3, 2, 1]

Explanation: Total revenues are 1=5, 2=1, 3=0. There are only 3 unique customers, so return all of them in sorted order by revenue.

Input: ([], 3)

Expected Output: []

Explanation: There are no events, so there are no customers to return.

Input: ([('solo', 7)], 0)

Expected Output: []

Explanation: When k is 0, the result must be an empty list.

Solution

def solution(events, k):
    if k <= 0 or not events:
        return []

    revenue = {}
    for customer_id, amount in events:
        revenue[customer_id] = revenue.get(customer_id, 0) + amount

    ordered = sorted(revenue.items(), key=lambda item: (item[1], item[0]))
    return [customer_id for customer_id, _ in ordered[:k]]

Time complexity: O(n + m log m), where n is the number of events and m is the number of unique customers. Space complexity: O(m).

Hints

  1. Use a hash map/dictionary to accumulate total revenue for each customer.
  2. After aggregation, sort the unique customers by `(revenue, customer_id)` and take the first `k`.
Last updated: Apr 19, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Find the Best Commute Mode - Databricks (medium)
  • Partition a Target String by Source Substrings - Databricks (medium)
  • Design In-Memory QPS Counter - Databricks (medium)
  • Implement Random Connectivity and Grid Routing - Databricks
  • Implement a Snapshot Set Iterator - Databricks (hard)