PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Software Engineering Fundamentals/Uber

Design a Real-Time Top-K Ranking System

Last updated: May 4, 2026

Quick Overview

This question evaluates a candidate's ability to design efficient in-memory data structures and APIs for a real-time Top-K ranking service, address concurrency and fault-tolerant batch processing, and outline tests for correctness and edge cases.

  • hard
  • Uber
  • Software Engineering Fundamentals
  • Software Engineer

Design a Real-Time Top-K Ranking System

Company: Uber

Role: Software Engineer

Category: Software Engineering Fundamentals

Difficulty: hard

Interview Round: Onsite

Design an object-oriented, real-time **Top-K ranking system**. The system continuously receives score updates for a large set of entities — for example users, drivers, restaurants, or products. Each entity has a unique ID and a single numeric score. Your design should expose a clean object-oriented API and support efficient insertion, update, removal, and top-K retrieval as scores change over time. The system must support the following operations: 1. `update(entity_id, new_score)` — Insert a new entity, or update the score of an existing entity. 2. `top_k(k)` — Return the `k` entities with the highest scores, ordered from highest to lowest. Ties must be broken deterministically — for example, by `entity_id` ascending after sorting by score descending. 3. `remove(entity_id)` — Remove an entity from the ranking. ### Constraints & Assumptions - Entity IDs are unique; each entity has exactly one current score. - Scores are numeric and may be updated arbitrarily often; assume an entity's score can go up or down. - `top_k` is expected to be called frequently and should be fast relative to the total number of entities. - The number of entities can be large (assume it does not all fit conveniently in a single linear scan per query), but it fits in memory on a single node for the core design. - Tie-breaking must be deterministic and stable across calls. - Treat the core design as in-memory and single-node first; concurrency and batch ingestion are addressed in later parts. ### Clarifying Questions to Ask - What is the approximate scale — number of entities, update rate (writes/sec), and `top_k` query rate (reads/sec)? Is this read-heavy or write-heavy? - What is the typical and maximum value of `k`? Is `k` small (e.g. a leaderboard top 10) or can it approach the total entity count? - For `remove` and `top_k(k)` with `k` larger than the population, what is the API contract — error, no-op, or return whatever exists? - Are scores integers or floating point, can they be negative, and is there a defined behavior for duplicate scores beyond the stated tie-break? - Must reads reflect the very latest write (strong consistency), or is slightly stale top-K acceptable? - Is this strictly in-memory for one process, or must rankings survive a restart / scale beyond one machine? ### What a Strong Answer Covers This is a design and coding signal, not a single-algorithm puzzle. A strong answer demonstrates: - A clear object-oriented API with the three operations and well-defined contracts for edge cases. - A justified choice of **two cooperating data structures** — one for direct lookup, one for ordered retrieval — and why neither alone suffices. - Correct, deterministic tie-breaking baked into the ordering key. - Time and space complexity for each operation, and the trade-offs between candidate structures (balanced tree vs. heap vs. sorted array). - How the update path keeps both structures consistent when an entity's score changes. - A concurrency story that starts simple and scales (see the parts below). - A batch-ingestion story that isolates bad records and stays observable. - A concrete, organized test plan covering correctness, edge cases, and concurrency. --- ### Part 1 — Core data structures and operations Propose the in-memory data structures and define the three operations (in code or precise pseudocode). Discuss the time and space complexity of `update`, `top_k`, and `remove`, and explain how you keep the structures consistent when an existing entity's score changes. ```hint Where to start No single structure does both jobs well. You need **fast lookup by `entity_id`** ("does it exist, what's its current score?") *and* **fast retrieval in score order**. Think about pairing two complementary structures. ``` ```hint Data structures Consider a **hash map** (`entity_id -> score`) for O(1) lookup alongside an **ordered structure** (balanced BST / `TreeSet` / `std::set`, or a heap) keyed by score. For top-K specifically, weigh a fully ordered set against a size-bounded structure. ``` ```hint Tie-breaking & the update path The ordering key needs to encode **both** the score *and* the tie-breaker so that no two entries compare equal — what does sorting on score alone leave ambiguous? Then walk the update path carefully: when an existing entity's score changes, what does the ordered structure need to know *before* it can locate the stale entry, and where does that information live? ``` ### Part 2 — Concurrency Explain how the design changes if multiple threads call `update`, `remove`, and `top_k` concurrently. Describe how you keep the two structures mutually consistent under concurrent access, and how your approach scales as write throughput grows. ```hint Where to start The two structures must be updated **atomically together** — a reader must never see the hash map and the ordered set disagree. Start by reasoning about the simplest correct synchronization. ``` ```hint Scaling beyond one lock A single read/write lock is simple and fine when reads dominate, but it serializes writes. To raise write throughput, think about **partitioning** entities across independent units and how you'd reassemble a global top-K from per-partition results. ``` ### Part 3 — Large input batches with errors Explain how to handle very large input batches in which some records may be malformed or fail during processing. The goal is to ingest the good records without letting a few bad ones fail the whole batch. ```hint Don't make one bad record abort the batch. Think about **per-record validation**, routing failures somewhere (e.g. a dead-letter queue / error log with a reason), **idempotency** for safe retries, and per-batch **metrics** (processed / succeeded / failed). What anchors ordering if stale updates arrive out of order? ``` ### Part 4 — Testing Describe the tests you would write for correctness, edge cases, and concurrency. ```hint Enumerate edge cases explicitly: empty ranking, `top_k(0)`, `top_k(k)` with `k` larger than the population, raising vs. lowering an existing score, duplicate scores (verify the deterministic tie-break), removing a non-existent entity, and negative/zero/large scores. For concurrency, think about how you'd assert correctness under interleaved readers and writers (e.g. invariant checks after a randomized concurrent workload). ``` --- ### Follow-up Questions - How does the design change if `k` can be as large as the entire entity set, versus a fixed small leaderboard (e.g. top 10)? Which data structure choice wins in each regime? - What breaks first as write throughput climbs by 100x, and how does sharding change the cost and accuracy of a global `top_k`? - If rankings must survive a process restart or scale across multiple machines, what would you add (persistence, partitioning, a coordinator), and what new consistency problems appear? - How would you support a "time-windowed" top-K (e.g. highest scores in the last hour) where old contributions must expire?

Quick Answer: This question evaluates a candidate's ability to design efficient in-memory data structures and APIs for a real-time Top-K ranking service, address concurrency and fault-tolerant batch processing, and outline tests for correctness and edge cases.

Related Interview Questions

  • Build a React Parking Lot Manager - Uber (medium)
  • Design a Parking Lot - Uber (medium)
  • Design a Parking Garage Object Model - Uber (medium)
  • Design follow/follower classes - Uber (medium)
  • Design a meeting room reservation API - Uber (medium)
Uber logo
Uber
May 2, 2026, 12:00 AM
Software Engineer
Onsite
Software Engineering Fundamentals
376
0

Design an object-oriented, real-time Top-K ranking system.

The system continuously receives score updates for a large set of entities — for example users, drivers, restaurants, or products. Each entity has a unique ID and a single numeric score. Your design should expose a clean object-oriented API and support efficient insertion, update, removal, and top-K retrieval as scores change over time.

The system must support the following operations:

  1. update(entity_id, new_score) — Insert a new entity, or update the score of an existing entity.
  2. top_k(k) — Return the k entities with the highest scores, ordered from highest to lowest. Ties must be broken deterministically — for example, by entity_id ascending after sorting by score descending.
  3. remove(entity_id) — Remove an entity from the ranking.

Constraints & Assumptions

  • Entity IDs are unique; each entity has exactly one current score.
  • Scores are numeric and may be updated arbitrarily often; assume an entity's score can go up or down.
  • top_k is expected to be called frequently and should be fast relative to the total number of entities.
  • The number of entities can be large (assume it does not all fit conveniently in a single linear scan per query), but it fits in memory on a single node for the core design.
  • Tie-breaking must be deterministic and stable across calls.
  • Treat the core design as in-memory and single-node first; concurrency and batch ingestion are addressed in later parts.

Clarifying Questions to Ask

  • What is the approximate scale — number of entities, update rate (writes/sec), and top_k query rate (reads/sec)? Is this read-heavy or write-heavy?
  • What is the typical and maximum value of k ? Is k small (e.g. a leaderboard top 10) or can it approach the total entity count?
  • For remove and top_k(k) with k larger than the population, what is the API contract — error, no-op, or return whatever exists?
  • Are scores integers or floating point, can they be negative, and is there a defined behavior for duplicate scores beyond the stated tie-break?
  • Must reads reflect the very latest write (strong consistency), or is slightly stale top-K acceptable?
  • Is this strictly in-memory for one process, or must rankings survive a restart / scale beyond one machine?

What a Strong Answer Covers

This is a design and coding signal, not a single-algorithm puzzle. A strong answer demonstrates:

  • A clear object-oriented API with the three operations and well-defined contracts for edge cases.
  • A justified choice of two cooperating data structures — one for direct lookup, one for ordered retrieval — and why neither alone suffices.
  • Correct, deterministic tie-breaking baked into the ordering key.
  • Time and space complexity for each operation, and the trade-offs between candidate structures (balanced tree vs. heap vs. sorted array).
  • How the update path keeps both structures consistent when an entity's score changes.
  • A concurrency story that starts simple and scales (see the parts below).
  • A batch-ingestion story that isolates bad records and stays observable.
  • A concrete, organized test plan covering correctness, edge cases, and concurrency.

Part 1 — Core data structures and operations

Propose the in-memory data structures and define the three operations (in code or precise pseudocode). Discuss the time and space complexity of update, top_k, and remove, and explain how you keep the structures consistent when an existing entity's score changes.

Part 2 — Concurrency

Explain how the design changes if multiple threads call update, remove, and top_k concurrently. Describe how you keep the two structures mutually consistent under concurrent access, and how your approach scales as write throughput grows.

Part 3 — Large input batches with errors

Explain how to handle very large input batches in which some records may be malformed or fail during processing. The goal is to ingest the good records without letting a few bad ones fail the whole batch.

Part 4 — Testing

Describe the tests you would write for correctness, edge cases, and concurrency.

Follow-up Questions

  • How does the design change if k can be as large as the entire entity set, versus a fixed small leaderboard (e.g. top 10)? Which data structure choice wins in each regime?
  • What breaks first as write throughput climbs by 100x, and how does sharding change the cost and accuracy of a global top_k ?
  • If rankings must survive a process restart or scale across multiple machines, what would you add (persistence, partitioning, a coordinator), and what new consistency problems appear?
  • How would you support a "time-windowed" top-K (e.g. highest scores in the last hour) where old contributions must expire?

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Software Engineering Fundamentals•More Uber•More Software Engineer•Uber Software Engineer•Uber Software Engineering Fundamentals•Software Engineer Software Engineering Fundamentals
PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.