PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/System Design/Uber

Design real-time driver heatmap system

Last updated: Apr 8, 2026

Quick Overview

This question evaluates a candidate's ability to design real-time distributed systems for geospatial aggregation, covering skills in spatial partitioning, high-throughput streaming ingestion, low-latency cell counting, WebSocket-based client updates, and top-K ranking within a fault-tolerant multi-region architecture.

  • hard
  • Uber
  • System Design
  • Software Engineer

Design real-time driver heatmap system

Company: Uber

Role: Software Engineer

Category: System Design

Difficulty: hard

Interview Round: Onsite

Design a real-time driver heatmap for a ride-hailing platform. Partition the city into grid cells (e.g., geohash), ingest driver pings, and stream updates to clients over WebSockets. Compute and publish the top-K hottest cells per region with low latency. Detail the data model, aggregation strategy (edge vs. server), update frequency, scalability, fault tolerance, and how clients subscribe/unsubscribe.

Quick Answer: This question evaluates a candidate's ability to design real-time distributed systems for geospatial aggregation, covering skills in spatial partitioning, high-throughput streaming ingestion, low-latency cell counting, WebSocket-based client updates, and top-K ranking within a fault-tolerant multi-region architecture.

Related Interview Questions

  • Design a Ride-Sharing System (Uber-style Core Platform) - Uber
  • Design a Food-Delivery Backend (Uber Eats-style) - Uber (medium)
  • Design a Real-Time Chat System - Uber (medium)
  • Design a Distributed Logging System - Uber (medium)
  • Design a Stock Trading Platform - Uber (medium)
|Home/System Design/Uber

Design real-time driver heatmap system

Uber logo
Uber
Aug 1, 2025, 12:00 AM
hardSoftware EngineerOnsiteSystem Design
82
0

System Design: Real-Time Driver Heatmap and Top-K Hottest Cells

Context

You are building a real-time driver heatmap for a ride-hailing platform that visualizes driver density across a city. The city is partitioned into spatial grid cells (e.g., geohash / H3 / S2). Driver apps send periodic GPS location pings, and the backend must aggregate these into live per-cell driver counts and stream updates to rider and ops clients over WebSockets.

In addition to the heatmap itself, the system must compute and publish the top-K hottest cells per region with low latency.

Constraints & Assumptions

Design for city-scale load and anchor your design to these numbers:

  • Active drivers: hundreds of thousands (low-single-digit hundreds of thousands; e.g., ~300k).
  • Ping cadence: 1–5 location pings per driver per minute.
  • End-to-end latency: sub-second to low-seconds (ping → visible update on clients).
  • Deployment: multi-region .
  • Workload shape: write-heavy ingest from drivers; read-heavy, fan-out-heavy consumption from many rider/ops clients.
  • Accuracy class: this is a visualization surface, not billing — small transient inaccuracy is acceptable, but persistent drift is not.

State any additional assumptions you make explicitly before you start designing.

Clarifying Questions to Ask

Before diving in, confirm the scope with the interviewer:

  1. What does "active driver" mean precisely — pinged within some TTL, currently online, or available-for-dispatch? Does each driver count in exactly one cell?
  2. What is the read:write ratio and the expected number of concurrent viewers (rider apps + ops dashboards) per region?
  3. How fresh must the heatmap be — is a sub-second tail required, or is low-seconds acceptable, and what is the latency budget broken down across ingest vs. fan-out?
  4. What zoom levels / cell resolutions must the client support, and does the viewport change frequently (panning/zooming)?
  5. How large is K , and how is a "region" defined (administrative boundary vs. a coarser grid resolution)?
  6. What are the consistency requirements — is convergent/eventually-correct counting acceptable, or must counts be exact at every instant?

What a Strong Answer Covers

The interviewer is looking for these signals (these are dimensions to address, not the answers):

  • A capacity estimate : pings/s, count-changing events/s, in-memory state size, and connection/fan-out volume — and noticing which of these is the real bottleneck.
  • A clear data model for pings, per-driver presence, per-cell counts, regional top-K, and subscriptions.
  • A principled choice of spatial index with justified tradeoffs.
  • A separation of the write path (ingest → aggregate) from the read/fan-out path (stream → clients).
  • A coherent story for where aggregation happens (edge vs. server) and why .
  • Scalability reasoning: partitioning keys, hot-cell/hot-partition handling, and WebSocket fan-out that scales with connections.
  • Fault tolerance & correctness invariants that survive retries, reconnects, and node failures (idempotency, ordering, replay).
  • Sensible tradeoffs stated explicitly, with the candidate able to defend each choice.

Part 1 — Spatial partitioning & data model

Partition the city into fixed-size grid cells (e.g., geohash / H3 / S2) and define the data model: the entities and state you maintain for pings, per-driver presence, per-cell counts, regional top-K, and subscriptions.

Part 2 — Ingest & aggregation (edge vs. server)

Ingest driver pings and compute the count of active drivers per cell in near real time. Address where the work happens (edge vs. server) and how you turn raw pings into per-cell counts. Define what "active" means (e.g., a TTL).

Part 3 — Streaming to clients & subscription management

Stream cell updates to clients over WebSockets, including subscribe / unsubscribe flows. Cover update frequency (ping cadence and how/when updates are pushed) and client subscription management — how clients subscribe, unsubscribe, and handle viewport changes and reconnection.

Part 4 — Top-K hottest cells per region

Compute and publish the top-K hottest cells per region with low latency, updating as counts change.

Follow-up Questions

Be ready for the interviewer to push deeper:

  1. A downtown rush-hour cell becomes a hot partition. How do you shard a single hot cell across multiple sub-partitions so a driver's +1+1+1 (enter) and later −1-1−1 (move-out/expiry) still reconcile correctly? What property must the sub-key have, and what goes wrong if you pick it naively (e.g., round-robin)?
  2. What is your correctness story under at-least-once delivery and reconnects? State the invariants that prevent double-counting a move, and explain whether counts are atomic at every instant or merely convergent.
  3. How does this change at 10–100x scale (a 1M-driver mega-region, or 1M concurrent viewers)? What is the first thing that breaks, and is it on the ingest path or the fan-out path?
  4. GPS jitter at a cell boundary makes a stationary driver flip-flop ±1\pm 1±1 . How do you stop the resulting churn without hurting freshness?

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Uber•More Software Engineer•Uber Software Engineer•Uber System Design•Software Engineer System Design

Your design canvas — auto-saved

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.