PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Software Engineering Fundamentals/Databricks

Optimize least-k revenue queries for read/write load

Last updated: May 23, 2026

Quick Overview

This question evaluates a candidate's ability to design and maintain dynamic aggregate metrics from nested or incremental inputs, emphasizing data modeling, streaming updates, and the use of appropriate data structures and algorithms to compute least-k aggregates.

  • medium
  • Databricks
  • Software Engineering Fundamentals
  • Software Engineer

Optimize least-k revenue queries for read/write load

Company: Databricks

Role: Software Engineer

Category: Software Engineering Fundamentals

Difficulty: medium

Interview Round: Technical Screen

## Follow-up Scenario Now assume revenue is not provided as a flat list of events, but may be **nested**, for example: - Each customer has many orders, and each order has many line items. - Or a stream of updates arrives as `(customer_id, delta_amount)` events. You need to support the query: > “Return the `k` customers with the smallest total revenue.” ### Questions 1. How would you compute/maintain customer revenue totals when the input is nested (orders → items) or incremental (`delta_amount` updates)? 2. What are the **time and space complexities** of your approach? 3. How would you change the design for: - **Read-heavy** workload (many `leastK()` queries, fewer updates) - **Write-heavy** workload (many updates, fewer queries) Assume you do **not** need to write full code, but must clearly describe data structures, operations, and complexity.

Quick Answer: This question evaluates a candidate's ability to design and maintain dynamic aggregate metrics from nested or incremental inputs, emphasizing data modeling, streaming updates, and the use of appropriate data structures and algorithms to compute least-k aggregates.

Related Interview Questions

  • Build a Durable Key-Value Cache - Databricks (medium)
  • Design a Cache with Hit Counts - Databricks (hard)
  • Design a multi-threaded synchronous log writer - Databricks (hard)
  • Design a multithreaded event logger - Databricks (medium)
  • Explain storing files to disk with concurrency - Databricks (medium)
Databricks logo
Databricks
Feb 12, 2026, 12:00 AM
Software Engineer
Technical Screen
Software Engineering Fundamentals
47
0

Follow-up Scenario

Now assume revenue is not provided as a flat list of events, but may be nested, for example:

  • Each customer has many orders, and each order has many line items.
  • Or a stream of updates arrives as (customer_id, delta_amount) events.

You need to support the query:

“Return the k customers with the smallest total revenue.”

Questions

  1. How would you compute/maintain customer revenue totals when the input is nested (orders → items) or incremental ( delta_amount updates)?
  2. What are the time and space complexities of your approach?
  3. How would you change the design for:
    • Read-heavy workload (many leastK() queries, fewer updates)
    • Write-heavy workload (many updates, fewer queries)

Assume you do not need to write full code, but must clearly describe data structures, operations, and complexity.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Software Engineering Fundamentals•More Databricks•More Software Engineer•Databricks Software Engineer•Databricks Software Engineering Fundamentals•Software Engineer Software Engineering Fundamentals
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.