Design nearest-available-taxi lookup with updates
Company: Google
Role: Software Engineer
Category: System Design
Difficulty: medium
Interview Round: Technical Screen
You are building a backend service for a ride-hailing app.
Taxis can **come online** (available) or **go offline** (unavailable) at any time. Clients need to query, for a given location, the **distance (or ETA proxy)** to the nearest available taxi.
Assume locations are either:
- a discrete grid (cells), or
- real-world latitude/longitude coordinates (more realistic).
### Requirements
- **API**
- `UpdateTaxiStatus(taxi_id, location, status)` where `status ∈ {online, offline}`
- `GetNearestTaxi(location) -> (taxi_id, distance)` (or return just `distance`)
- **Performance**
- Low latency for queries (e.g., p99 < 200 ms)
- High update rate (taxis frequently move and toggle availability)
- **Correctness**
- Prefer the nearest *currently online* taxi; allow small staleness if needed.
- **Scale** (example)
- 1M taxis globally, 100K updates/sec, 50K queries/sec
### Design questions
- What data structures and storage would you use to support fast nearest-neighbor queries under frequent updates?
- How would you partition/shard the data?
- How would you handle online/offline churn, stale data, and deletion (e.g., lazy deletion)?
- If the world were a small fixed grid and the query were “distance for every cell,” when (if ever) would you precompute distances with BFS?
Quick Answer: This question evaluates the ability to design scalable, low-latency geospatial lookup services under high update and query rates, assessing skills in spatial indexing, real-time state management, partitioning/sharding, and availability-versus-consistency trade-offs.