You are building a backend service for a ride-hailing app.
Taxis can come online (available) or go offline (unavailable) at any time. Clients need to query, for a given location, the distance (or ETA proxy) to the nearest available taxi.
Assume locations are either:
-
a discrete grid (cells), or
-
real-world latitude/longitude coordinates (more realistic).
Requirements
-
API
-
UpdateTaxiStatus(taxi_id, location, status)
where
status ∈ {online, offline}
-
GetNearestTaxi(location) -> (taxi_id, distance)
(or return just
distance
)
-
Performance
-
Low latency for queries (e.g., p99 < 200 ms)
-
High update rate (taxis frequently move and toggle availability)
-
Correctness
-
Prefer the nearest
currently online
taxi; allow small staleness if needed.
-
Scale
(example)
-
1M taxis globally, 100K updates/sec, 50K queries/sec
Design questions
-
What data structures and storage would you use to support fast nearest-neighbor queries under frequent updates?
-
How would you partition/shard the data?
-
How would you handle online/offline churn, stale data, and deletion (e.g., lazy deletion)?
-
If the world were a small fixed grid and the query were “distance for every cell,” when (if ever) would you precompute distances with BFS?