How do I approach Software Engineering Fundamentals interview questions?

Software Engineering Fundamentals questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master software engineering fundamentals interviews.

What difficulty level is this interview question?

This is a hard difficulty Software Engineering Fundamentals question, commonly asked during Onsite rounds at Uber.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Uber during technical interviews.

Design a Real-Time Top-K Ranking System

Q: Design a Real-Time Top-K Ranking System

This question evaluates a candidate's ability to design efficient in-memory data structures and APIs for a real-time Top-K ranking service, address concurrency and fault-tolerant batch processing, and outline tests for correctness and edge cases.

Design an object-oriented, real-time Top-K ranking system.

The system continuously receives score updates for a large set of entities — for example users, drivers, restaurants, or products. Each entity has a unique ID and a single numeric score. Your design should expose a clean object-oriented API and support efficient insertion, update, removal, and top-K retrieval as scores change over time.

The system must support the following operations:

update(entity_id, new_score) — Insert a new entity, or update the score of an existing entity.
top_k(k) — Return the k entities with the highest scores, ordered from highest to lowest. Ties must be broken deterministically — for example, by entity_id ascending after sorting by score descending.
remove(entity_id) — Remove an entity from the ranking.

Constraints & Assumptions

Entity IDs are unique; each entity has exactly one current score.
Scores are numeric and may be updated arbitrarily often; assume an entity's score can go up or down.
top_k is expected to be called frequently and should be fast relative to the total number of entities.
The number of entities can be large (assume it does not all fit conveniently in a single linear scan per query), but it fits in memory on a single node for the core design.
Tie-breaking must be deterministic and stable across calls.
Treat the core design as in-memory and single-node first; concurrency and batch ingestion are addressed in later parts.

Clarifying Questions to Ask Guidance

What is the approximate scale — number of entities, update rate (writes/sec), and top_k query rate (reads/sec)? Is this read-heavy or write-heavy?
What is the typical and maximum value of k ? Is k small (e.g. a leaderboard top 10) or can it approach the total entity count?
For remove , and for top_k(k) with k larger than the population, what is the API contract — error, no-op, or return whatever exists?
Are scores integers or floating point, can they be negative, and is there a defined behavior for duplicate scores beyond the stated tie-break?
Must reads reflect the very latest write (strong consistency), or is slightly stale top-K acceptable?
Is this strictly in-memory for one process, or must rankings survive a restart / scale beyond one machine?

Part 1 — Core data structures and operations

Propose the in-memory data structures and define the three operations (in code or precise pseudocode). Discuss the time and space complexity of update, top_k, and remove, and explain how you keep the structures consistent when an existing entity's score changes.

What This Part Should Cover Guidance

A clear object-oriented API with the three operations and well-defined contracts for the edge cases named in Constraints (e.g. top_k(0) , k > population, remove of an absent ID).
A justified pairing of two cooperating structures — one for $O(1)$ lookup by ID, one for ordered retrieval — and a concrete reason neither alone suffices.
A deterministic composite ordering key that encodes both score and a tie-breaker, so no two entries ever compare equal, and a clear statement of what is ambiguous without it.
Per-operation time and space complexity , plus the trade-offs across candidate ordered structures (balanced tree vs. size-bounded heap vs. sorted array).
A correct update path that keeps both structures mutually consistent when a score changes, including what information must be retrieved before the ordered structure can be modified.

Part 2 — Concurrency

Explain how the design changes if multiple threads call update, remove, and top_k concurrently. Describe how you keep the two structures mutually consistent under concurrent access, and how your approach scales as write throughput grows.

What This Part Should Cover Guidance

The atomicity invariant: a top_k reader must never observe the map and the ordered structure mid-update / disagreeing.
A simple, correct baseline synchronization approach and an honest statement of its throughput limit.
A higher-throughput design that partitions entities so independent writes can proceed in parallel, and a description of how a global top_k is reassembled from per-partition results.
The exactness condition for the merge: what minimum number of candidates each partition must contribute, and why that bound is necessary.

Part 3 — Large input batches with errors

Explain how to handle very large input batches in which some records may be malformed or fail during processing. The goal is to ingest the good records without letting a few bad ones fail the whole batch.

What This Part Should Cover Guidance

Per-record isolation : validate and process each record independently so one bad record can't fail the batch.
A destination for failures — a dead-letter queue / error log with the reason — so nothing is silently dropped.
Idempotency (event IDs / sequence numbers) so a failed slice can be retried safely.
Observability : per-batch processed / succeeded / failed counts and latency.
An ordering anchor (per-entity version or timestamp) so a stale update can't overwrite a newer score when records arrive out of order.

Part 4 — Testing

Describe the tests you would write for correctness, edge cases, and concurrency.

What This Part Should Cover Guidance

Correctness & edge cases enumerated explicitly: empty ranking, top_k(0) , k > population, raising vs. lowering a score, duplicate scores (assert the tie-break), removing an absent ID, and negative/zero/large scores.
A batch test mixing valid and malformed records: valid ones land, malformed ones hit the dead-letter path, and the metric counts reconcile.
A concurrency test : a randomized, seeded interleaving of readers and writers, with invariant checks afterward (map and ordered structure agree on every score; exactly one ordered key per entity; a serialized replay yields the same final ranking).

What a Strong Answer Covers Guidance

These dimensions span all four parts:

This is a design and coding signal , not a single-algorithm puzzle — the candidate justifies choices and states trade-offs rather than reaching for one structure.
Consistency of the two structures is preserved end-to-end: through the single-threaded update path (Part 1), under concurrency (Part 2), and under partial/out-of-order batch failures (Part 3), and is what the tests assert (Part 4).
Reasoning is scale-aware : the right answer changes with k vs. n and with read- vs. write-heavy load, and the candidate names what breaks first as throughput grows.

Follow-up Questions Guidance

How does the design change if k can be as large as the entire entity set, versus a fixed small leaderboard (e.g. top 10)? Which data structure choice wins in each regime?
What breaks first as write throughput climbs by 100x, and how does sharding change the cost and accuracy of a global top_k ?
If rankings must survive a process restart or scale across multiple machines, what would you add (persistence, partitioning, a coordinator), and what new consistency problems appear?
How would you support a "time-windowed" top-K (e.g. highest scores in the last hour) where old contributions must expire?

Design an object-oriented, real-time Top-K ranking system.

The system must support the following operations:

update(entity_id, new_score) — Insert a new entity, or update the score of an existing entity.
top_k(k) — Return the k entities with the highest scores, ordered from highest to lowest. Ties must be broken deterministically — for example, by entity_id ascending after sorting by score descending.
remove(entity_id) — Remove an entity from the ranking.

Constraints & Assumptions

Entity IDs are unique; each entity has exactly one current score.
Scores are numeric and may be updated arbitrarily often; assume an entity's score can go up or down.
top_k is expected to be called frequently and should be fast relative to the total number of entities.
The number of entities can be large (assume it does not all fit conveniently in a single linear scan per query), but it fits in memory on a single node for the core design.
Tie-breaking must be deterministic and stable across calls.
Treat the core design as in-memory and single-node first; concurrency and batch ingestion are addressed in later parts.

Clarifying Questions to Ask Guidance

What is the approximate scale — number of entities, update rate (writes/sec), and top_k query rate (reads/sec)? Is this read-heavy or write-heavy?
What is the typical and maximum value of k ? Is k small (e.g. a leaderboard top 10) or can it approach the total entity count?
For remove , and for top_k(k) with k larger than the population, what is the API contract — error, no-op, or return whatever exists?
Are scores integers or floating point, can they be negative, and is there a defined behavior for duplicate scores beyond the stated tie-break?
Must reads reflect the very latest write (strong consistency), or is slightly stale top-K acceptable?
Is this strictly in-memory for one process, or must rankings survive a restart / scale beyond one machine?

Part 1 — Core data structures and operations

What This Part Should Cover Guidance

A clear object-oriented API with the three operations and well-defined contracts for the edge cases named in Constraints (e.g. top_k(0) , k > population, remove of an absent ID).
A justified pairing of two cooperating structures — one for $O(1)$ lookup by ID, one for ordered retrieval — and a concrete reason neither alone suffices.
A deterministic composite ordering key that encodes both score and a tie-breaker, so no two entries ever compare equal, and a clear statement of what is ambiguous without it.
Per-operation time and space complexity , plus the trade-offs across candidate ordered structures (balanced tree vs. size-bounded heap vs. sorted array).
A correct update path that keeps both structures mutually consistent when a score changes, including what information must be retrieved before the ordered structure can be modified.

Part 2 — Concurrency

What This Part Should Cover Guidance

The atomicity invariant: a top_k reader must never observe the map and the ordered structure mid-update / disagreeing.
A simple, correct baseline synchronization approach and an honest statement of its throughput limit.
A higher-throughput design that partitions entities so independent writes can proceed in parallel, and a description of how a global top_k is reassembled from per-partition results.
The exactness condition for the merge: what minimum number of candidates each partition must contribute, and why that bound is necessary.

Part 3 — Large input batches with errors

What This Part Should Cover Guidance

Per-record isolation : validate and process each record independently so one bad record can't fail the batch.
A destination for failures — a dead-letter queue / error log with the reason — so nothing is silently dropped.
Idempotency (event IDs / sequence numbers) so a failed slice can be retried safely.
Observability : per-batch processed / succeeded / failed counts and latency.
An ordering anchor (per-entity version or timestamp) so a stale update can't overwrite a newer score when records arrive out of order.

Part 4 — Testing

Describe the tests you would write for correctness, edge cases, and concurrency.

What This Part Should Cover Guidance

Correctness & edge cases enumerated explicitly: empty ranking, top_k(0) , k > population, raising vs. lowering a score, duplicate scores (assert the tie-break), removing an absent ID, and negative/zero/large scores.
A batch test mixing valid and malformed records: valid ones land, malformed ones hit the dead-letter path, and the metric counts reconcile.
A concurrency test : a randomized, seeded interleaving of readers and writers, with invariant checks afterward (map and ordered structure agree on every score; exactly one ordered key per entity; a serialized replay yields the same final ranking).

What a Strong Answer Covers Guidance

These dimensions span all four parts:

This is a design and coding signal , not a single-algorithm puzzle — the candidate justifies choices and states trade-offs rather than reaching for one structure.
Consistency of the two structures is preserved end-to-end: through the single-threaded update path (Part 1), under concurrency (Part 2), and under partial/out-of-order batch failures (Part 3), and is what the tests assert (Part 4).
Reasoning is scale-aware : the right answer changes with k vs. n and with read- vs. write-heavy load, and the candidate names what breaks first as throughput grows.

Follow-up Questions Guidance

How does the design change if k can be as large as the entire entity set, versus a fixed small leaderboard (e.g. top 10)? Which data structure choice wins in each regime?
What breaks first as write throughput climbs by 100x, and how does sharding change the cost and accuracy of a global top_k ?
If rankings must survive a process restart or scale across multiple machines, what would you add (persistence, partitioning, a coordinator), and what new consistency problems appear?
How would you support a "time-windowed" top-K (e.g. highest scores in the last hour) where old contributions must expire?

Design a Real-Time Top-K Ranking System

Quick Overview

Design a Real-Time Top-K Ranking System

Constraints & Assumptions

Clarifying Questions to Ask Guidance

Part 1 — Core data structures and operations

What This Part Should Cover Guidance

Part 2 — Concurrency

What This Part Should Cover Guidance

Part 3 — Large input batches with errors

What This Part Should Cover Guidance

Part 4 — Testing

What This Part Should Cover Guidance

What a Strong Answer Covers Guidance

Follow-up Questions Guidance

Write your answer

Design a Real-Time Top-K Ranking System

Quick Overview

Design a Real-Time Top-K Ranking System

Constraints & Assumptions

Clarifying Questions to Ask Guidance

Part 1 — Core data structures and operations

What This Part Should Cover Guidance

Part 2 — Concurrency

What This Part Should Cover Guidance

Part 3 — Large input batches with errors

What This Part Should Cover Guidance

Part 4 — Testing

What This Part Should Cover Guidance

What a Strong Answer Covers Guidance

Follow-up Questions Guidance

Write your answer