Design real-time driver heatmap system

Q: Design real-time driver heatmap system

This question evaluates a candidate's ability to design real-time distributed systems for geospatial aggregation, covering skills in spatial partitioning, high-throughput streaming ingestion, low-latency cell counting, WebSocket-based client updates, and top-K ranking within a fault-tolerant multi-region architecture.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Q: What difficulty level is this interview question?

This is a hard difficulty System Design question, commonly asked during Onsite rounds at Uber.

Q: What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Uber during technical interviews.

Question

System Design: Real-Time Driver Heatmap and Top-K Hottest Cells

Context

You are building a real-time driver heatmap for a ride-hailing platform that visualizes driver density across a city. The city is partitioned into spatial grid cells (e.g., geohash / H3 / S2). Driver apps send periodic GPS location pings, and the backend must aggregate these into live per-cell driver counts and stream updates to rider and ops clients over WebSockets.

In addition to the heatmap itself, the system must compute and publish the top-K hottest cells per region with low latency.

Constraints & Assumptions

Design for city-scale load and anchor your design to these numbers:

Active drivers: hundreds of thousands (low-single-digit hundreds of thousands; e.g., ~300k).
Ping cadence: 1–5 location pings per driver per minute.
End-to-end latency: sub-second to low-seconds (ping → visible update on clients).
Deployment: multi-region .
Workload shape: write-heavy ingest from drivers; read-heavy, fan-out-heavy consumption from many rider/ops clients.
Accuracy class: this is a visualization surface, not billing — small transient inaccuracy is acceptable, but persistent drift is not.

State any additional assumptions you make explicitly before you start designing.

Clarifying Questions to Ask

Before diving in, confirm the scope with the interviewer:

What does "active driver" mean precisely — pinged within some TTL, currently online, or available-for-dispatch? Does each driver count in exactly one cell?
What is the read:write ratio and the expected number of concurrent viewers (rider apps + ops dashboards) per region?
How fresh must the heatmap be — is a sub-second tail required, or is low-seconds acceptable, and what is the latency budget broken down across ingest vs. fan-out?
What zoom levels / cell resolutions must the client support, and does the viewport change frequently (panning/zooming)?
How large is K , and how is a "region" defined (administrative boundary vs. a coarser grid resolution)?
What are the consistency requirements — is convergent/eventually-correct counting acceptable, or must counts be exact at every instant?

What a Strong Answer Covers

The interviewer is looking for these signals (these are dimensions to address, not the answers):

A capacity estimate : pings/s, count-changing events/s, in-memory state size, and connection/fan-out volume — and noticing which of these is the real bottleneck.
A clear data model for pings, per-driver presence, per-cell counts, regional top-K, and subscriptions.
A principled choice of spatial index with justified tradeoffs.
A separation of the write path (ingest → aggregate) from the read/fan-out path (stream → clients).
A coherent story for where aggregation happens (edge vs. server) and why .
Scalability reasoning: partitioning keys, hot-cell/hot-partition handling, and WebSocket fan-out that scales with connections.
Fault tolerance & correctness invariants that survive retries, reconnects, and node failures (idempotency, ordering, replay).
Sensible tradeoffs stated explicitly, with the candidate able to defend each choice.

Part 1 — Spatial partitioning & data model

Partition the city into fixed-size grid cells (e.g., geohash / H3 / S2) and define the data model: the entities and state you maintain for pings, per-driver presence, per-cell counts, regional top-K, and subscriptions.

Part 2 — Ingest & aggregation (edge vs. server)

Ingest driver pings and compute the count of active drivers per cell in near real time. Address where the work happens (edge vs. server) and how you turn raw pings into per-cell counts. Define what "active" means (e.g., a TTL).

Part 3 — Streaming to clients & subscription management

Stream cell updates to clients over WebSockets, including subscribe / unsubscribe flows. Cover update frequency (ping cadence and how/when updates are pushed) and client subscription management — how clients subscribe, unsubscribe, and handle viewport changes and reconnection.

Part 4 — Top-K hottest cells per region

Compute and publish the top-K hottest cells per region with low latency, updating as counts change.

Follow-up Questions

Be ready for the interviewer to push deeper:

A downtown rush-hour cell becomes a hot partition. How do you shard a single hot cell across multiple sub-partitions so a driver's $+1$ (enter) and later $-1$ (move-out/expiry) still reconcile correctly? What property must the sub-key have, and what goes wrong if you pick it naively (e.g., round-robin)?
What is your correctness story under at-least-once delivery and reconnects? State the invariants that prevent double-counting a move, and explain whether counts are atomic at every instant or merely convergent.
How does this change at 10–100x scale (a 1M-driver mega-region, or 1M concurrent viewers)? What is the first thing that breaks, and is it on the ingest path or the fan-out path?
GPS jitter at a cell boundary makes a stationary driver flip-flop $\pm 1$ . How do you stop the resulting churn without hurting freshness?

Design real-time driver heatmap system

Quick Overview

Design real-time driver heatmap system

System Design: Real-Time Driver Heatmap and Top-K Hottest Cells

Context

Constraints & Assumptions

Clarifying Questions to Ask

What a Strong Answer Covers

Part 1 — Spatial partitioning & data model

Part 2 — Ingest & aggregation (edge vs. server)

Part 3 — Streaming to clients & subscription management

Part 4 — Top-K hottest cells per region

Follow-up Questions

Submit Your Answer to Earn 20XP

Design real-time driver heatmap system

Quick Overview

Design real-time driver heatmap system

System Design: Real-Time Driver Heatmap and Top-K Hottest Cells

Context

Constraints & Assumptions

Clarifying Questions to Ask

What a Strong Answer Covers

Part 1 — Spatial partitioning & data model

Part 2 — Ingest & aggregation (edge vs. server)

Part 3 — Streaming to clients & subscription management

Part 4 — Top-K hottest cells per region

Follow-up Questions

Submit Your Answer to Earn 20XP