Design Nearby Restaurant Search
Company: Uber
Role: Software Engineer
Category: System Design
Interview Round: Onsite
Design the backend for a **nearby-restaurant search** feature in a high-traffic consumer app (think the "restaurants near me" screen of a food-delivery or maps app).
A user opens the app, the client sends their current location, and the system returns a ranked list of nearby restaurants. The feature must support:
- **Proximity search** — find restaurants within a radius (e.g. 1 km or 5 km) of the user's coordinates.
- **Ranking** — order results by a blend of distance, relevance, rating, popularity, and availability.
- **Filtering** — by cuisine type, price range, opening hours (open-now), delivery/pickup availability, and minimum rating.
- **Low latency** — the result list backs an interactive mobile screen, so it must return quickly under heavy load.
Your design should cover the **public API, data model, geospatial indexing strategy, ranking approach, caching strategy, scalability/partitioning plan, and the update path** for restaurant changes — including location moves, opening-hour edits, and temporary closures (a restaurant pausing orders for the evening).
```hint Geospatial index — the core decision
A naive distance scan over 10M rows is hopeless. What spatial data structure lets you map a `(lat, lng)` to a bounded region and ask "what's near here" without touching the whole catalog? Consider a hierarchical cell grid (geohash / S2 / H3), how the circular query relates to those cells, and what happens under wildly variable density.
```
```hint The shape mismatch
Whatever cell structure you pick, its units won't be circles. Think about how you'd turn a radius query into a *cover set* of those cells, and what extra step you owe the user once you've gathered the candidates so the result is exactly the radius they asked for. Boundary items are the classic correctness trap.
```
```hint Ranking — where does the cost go?
Separate a cheap, broad first pass (candidate generation) from a more expensive second pass (scoring), and decide which runs online per request versus offline ahead of time. Which signals (popularity, conversion, prep-time) are too expensive to compute inside a ~200 ms budget?
```
```hint Scaling, freshness & hotspots
How would you shard the index so one query touches only a small slice of the fleet? Then confront the skew — one downtown area is a hotspot while most of the map is empty. What knobs let dense and sparse regions behave differently, and how do you keep fast-changing status (a closure) fresh without rebuilding everything? Think event stream + ordered cell-membership updates.
```
### Constraints & Assumptions
State your own assumptions, but design against roughly this scale:
- **Catalog**: ~10M restaurants worldwide, heavily skewed toward dense metros.
- **Traffic**: ~50K search QPS at peak; a single user issues several searches per session (panning the map, changing filters).
- **Latency target**: p95 end-to-end under ~200 ms for a typical search.
- **Freshness**: temporary closures / "paused" status must propagate within seconds; static profile edits (cuisine, hours) within minutes is acceptable.
- **Read-heavy**: searches vastly outnumber restaurant writes (writes are merchant edits + operational status changes, on the order of thousands/sec at most).
- **Geo-skew**: candidate density per unit area ranges from near-zero (rural) to thousands within a few hundred meters (downtown).
### Clarifying Questions to Ask
- Is this **delivery-radius** search (restaurant delivers to me) or **proximity** search (restaurant is physically near me)? They produce different geo semantics.
- How **personalized** must ranking be — global signals only, or per-user history? Personalization largely kills query-result caching.
- What's the acceptable **staleness** for each field — is showing a just-closed restaurant for a few seconds tolerable, or must closures be hard-real-time?
- Do we need **map-viewport** ("search this area") queries in addition to "near me", and pagination beyond the first screen?
- What is the geographic footprint — single country or global (affecting time zones, DST, sharding)?
- Are there **marketplace constraints** (sponsored placement, supply balancing) that the ranker must respect?
### What a Strong Answer Covers
- A defensible **geospatial indexing** approach with reasoned trade-offs, and an explanation of why the index alone isn't enough for an exact-radius answer.
- A **well-specified API** (validation, sane caps, robust pagination, deterministic ordering) and a data model that keeps the online read path cheap.
- A **ranking** approach that stays within the latency budget and degrades gracefully, with honest treatment of which signals are affordable online vs. precomputed.
- A **caching** strategy that confronts the read-heavy, dense-area load and the tension between caching and personalization, plus how stale data is kept out.
- A **scalability/partitioning** plan that bounds the work per query and addresses geographic skew and hotspots.
- An **update path** that keeps fast-changing status fresh and reasons carefully about the trickiest write — and **open-now** correctness across time zones and DST.
- **Observability**: the latency, freshness, coverage, cache, and ranking-quality signals you'd watch.
### Follow-up Questions
- A restaurant's coordinates are corrected and it jumps to a neighboring cell. Walk through exactly what happens in the index and caches so it neither disappears nor appears twice.
- How do you keep **open-now** filtering correct when the user, the restaurant, and your servers are in different time zones, and across daylight-saving transitions?
- The product wants "search this map area" (an arbitrary rectangle the user panned to) instead of a radius. What changes in the index query and pagination?
- Downtown San Francisco has 5,000 restaurants within 800 m. How do you keep that query fast and the result set useful rather than returning thousands of near-ties?
Quick Answer: This question evaluates competency in large-scale system design for low-latency geospatial search, covering concepts such as geospatial indexing, candidate generation and ranking pipelines, filtering, caching, and update/freshness handling.