Design Uber Eats-style search function

Q: Design Uber Eats-style search function

An Uber Software Engineer system design onsite question: design the search function for an Uber Eats-style food delivery platform, covering APIs, data ingestion, geo-aware indexing and retrieval, multi-stage ranking, real-time availability updates, scaling, caching, fault tolerance, monitoring, and freshness-vs-latency trade-offs. It tests the ability to build a low-latency, location-aware, highly available search system with strong observability.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Q: What difficulty level is this interview question?

This is a hard difficulty System Design question, commonly asked during Onsite rounds at Uber.

Q: What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Uber during technical interviews.

Question

Design the search function for a large-scale food delivery platform similar to Uber Eats.

A user opens the app and types a query into a single search box. The query may be a restaurant name (e.g., "Joe's Pizza"), a cuisine (e.g., "Thai", "Sushi"), or a dish name (e.g., "cheeseburger", "pad thai", "vegan burger"). The system must return a ranked list of relevant restaurants (and possibly individual dishes) that can actually deliver to the user's current location, taking into account relevance, restaurant availability, delivery distance, estimated delivery time (ETA), ratings, promotions, and business constraints.

Your task is to design the end-to-end system that powers this search box: the public APIs, the data sources that feed the index, the indexing and retrieval architecture, the live query flow, the ranking pipeline, how freshness is maintained, how the system scales and degrades, and how it is monitored. The question is broken into Parts below.

Constraints & Assumptions

Treat the following as the target operating point (state any number you assume; these are reasonable defaults, not exact figures):

Scale : tens of millions of monthly active users (assume ~50M MAU / ~5M DAU), on the order of hundreds of thousands of restaurants (assume ~500K) with tens of menu items each (~15M dishes).
Traffic : search is overwhelmingly read-heavy. Plan for a dinner-rush peak of roughly 1–2K full-search QPS ; as-you-type autocomplete fires per keystroke, so its tier sees several times that.
Latency : target P95 under ~200 ms server-side for full search (gateway → response, excluding client network); autocomplete is tighter still.
Availability : search is a primary acquisition surface — target high availability (~99.9%+) and degrade gracefully rather than return an error .
Freshness : there are two distinct freshness classes. Slow data (menu edits, new restaurants) tolerates a minute or two of lag. Fast data (a restaurant closing, an item selling out, an ETA spike) must take effect within seconds.
Locality : a query only ever concerns restaurants that can reach one point. The problem is intensely local — geography is a natural partition key.
Out of scope (consume as upstream sources, do not design): order checkout/payments, the courier-dispatch system (we read its ETA), and the restaurant-facing menu editor.

Clarifying Questions to Ask

Questions a candidate should raise up front to scope the whole problem:

Is as-you-type autocomplete in scope in addition to full search-results queries, and what is its latency target relative to full search?
Are we ranking restaurants only , or must we also surface individual matched dishes as first-class results (e.g., a dish query like "pad thai")?
Is the strict deliverability test "the restaurant's delivery polygon covers the user's exact point," or is a coarse radius acceptable for the MVP?
What is the freshness SLA separately for slow menu/profile edits versus fast open/closed and sold-out changes?
Do we need personalization (order history, dietary preferences) in v1, or is location + relevance + popularity sufficient to launch?
Are paid placements / ads / promotions part of ranking, and are there fairness or supply-side exposure constraints we must respect?
What pagination guarantees are required — is occasional skip/duplication across pages acceptable, or must paging be strictly consistent against a moving result set?

Part 1 — Requirements & Scope

State the functional requirements (search by restaurant name / cuisine / dish; location-aware results; only restaurants that can deliver to the user; filters such as open-now / price / rating / delivery-time / dietary tags; ranking by relevance + distance/ETA + popularity/rating + optional personalization) and the non-functional requirements (latency, availability, freshness classes, scalability). Then produce a back-of-the-envelope sizing (QPS, corpus size, write/availability-churn rate, per-query candidate-set size) and explain which architectural decision each number drives.

What This Part Should Cover

A clean split of functional vs. non-functional requirements, with explicit numbers for latency (P95), availability, and the two freshness classes .
A defensible back-of-the-envelope: search QPS at peak, autocomplete QPS, corpus/index size (and that it fits in RAM), and the availability-churn write rate (tens of thousands of state changes/min) as distinct from low menu-edit write rate.
Bounding the per-query candidate set via locality, and naming the consequence (geo-partitioned, read-replicated, in-memory index).
Stating what is explicitly out of scope.

Part 2 — User-Facing Search APIs

Define the search endpoint(s), the request parameters (query, location, filters, sort, pagination), and the response shape. Cover both the autocomplete/typeahead endpoint and the full ranked-search endpoint if you include autocomplete in scope.

Clarifying Questions for this Part

Is location passed as resolved lat/lng from the client, or as a saved address_id the gateway resolves — and is location mandatory (no global "all restaurants" mode)?
Should the response carry per-restaurant matched dishes inline, and/or a separate block of directly-matched dishes for dish queries?

What This Part Should Cover

A clear REST (or equivalent) contract with query, location, filters, sort, pagination, and limit — for both autocomplete and full search.
Cursor-based (not offset-based) pagination, with a stated reason and an awareness of the snapshot-vs-stateless trade-off.
A response shape that includes the fields the UI needs (name, rating, ETA, fee, promo, open status, matched items) plus a request_id for log-join / tracing.
A nod to the internal APIs the system also needs (document upsert, availability upsert, feature lookup, event sink).

Part 3 — Data Ingestion & Sources

Identify the source-of-truth systems the index draws from — restaurant profiles, menus, availability/inventory, pricing, logistics/ETA, ratings/reviews, promotions/ads, and user context. Classify each by change rate, and explain how that classification decides where the data lives.

What This Part Should Cover

An enumeration of the sources with concrete example fields for each.
A slow vs. fast classification per source, ideally as a table.
The conclusion that static fields are baked into the index document while fast fields are read at query time from an overlay / feature store — and why that boundary is drawn there.

Part 4 — Indexing & Retrieval Architecture

Describe the overall components, the search document model, the indexing pipeline, and how text, geo, and structured filters are supported in one query.

What This Part Should Cover

The component inventory: gateway, stateless search service, search-index cluster (inverted + geo), a separate autocomplete tier, an overlay/availability store, the indexing pipeline, a ranking/feature service, and caching.
A concrete search document model (restaurant-level with denormalized menu_text , cuisines, geo, tiled service_area , rating, popularity, version), and optionally parallel dish documents.
How text (analyzers, synonyms, fuzzy/edge-ngrams), geo (tiled service area + distance as a separate signal), and structured filters (price, rating, dietary, hours) coexist.
An idempotent, ordered, self-healing indexing pipeline (CDC → stream → versioned upsert; periodic bulk rebuild with alias swap).

Part 5 — Query Flow

Trace a single request GET /v1/search?q=pad thai&lat=..&lng=.. end-to-end: client → gateway → search service → candidate retrieval → hard filtering → dynamic-feature enrichment → ranking → paginated response. Identify what work happens at each hop and give an illustrative latency budget that fits under the P95 target.

What This Part Should Cover

An ordered trace with a clear job at each stage: gateway (auth, location resolve, rate-limit, request_id); parse/normalize/synonym-expand; candidate retrieval tuned for recall; hard availability filtering against the overlay; dynamic-feature enrichment (batched, parallel, time-boxed); ranking; pagination + logging + response.
Correct ordering of the deliverability gate, availability filter, and enrichment (hard filters before expensive ranking).
An illustrative per-stage latency budget summing to under ~200 ms, with parallelism and timeouts on slow dependencies.

Part 6 — Ranking Strategy

Explain how the multiple signals are combined into a final ordering, and how this evolves from a heuristic weighted sum to a multi-stage machine-learned ranking pipeline. Address personalization and how business constraints (ads/promos, supply-side fairness) fit in without destroying relevance.

What This Part Should Cover

The Stage 0 heuristic weighted sum with sensible signed weights (penalties for ETA/distance), and why it's a good launch + fallback.
A multi-stage retrieval → first-pass → heavy re-ranker pipeline, with the candidate-set sizes shrinking and model cost rising per stage, optimized to a business objective (conversion/GMV).
Personalization treated as features (cuisine affinity, reorder propensity, dietary match, time-of-day) that modify score but never override hard deliverability/availability filters.
Training-data provenance (impression→click→order via request_id ), an LTR objective, and awareness of position bias correction.
Ads/promos and supply-side fairness (e.g., new-restaurant exposure floors) layered on as constrained boosts.

Part 7 — Handling Real-Time Updates

Show how the system keeps results fresh when a restaurant closes, an item goes out of stock, a price changes, a delivery area changes, or ETA spikes due to demand. Map each event class to the right mechanism and target freshness.

What This Part Should Cover

A per-event mapping (close/pause, sold-out, ETA/surge/fee, price, delivery-area change, new restaurant/menu/rating) to index upsert vs. overlay-store write vs. query-time fetch , with a target freshness per class.
The justification for two mechanisms (static event-driven index + live overlay), and that closing a restaurant becomes a single overlay write taking effect on the next query — no re-index.
A drift-repair story (periodic bulk rebuild + version-gated upserts) so a missed event self-heals.

Part 8 — Scaling, Caching & Fault Tolerance

Cover sharding/replication, geo-partitioning, caching of availability-sensitive results, graceful degradation, and concrete failure scenarios.

What This Part Should Cover

Geo-partitioning by region/city (one query → few partitions; blast-radius isolation), sub-sharding dense metros, and replication for read throughput + HA; a stateless, horizontally-scaled search service.
A caching scheme with an explicit invariant: static/lexical results may be cached, but the overlay availability check is never cached past the staleness budget.
Graceful degradation for each failure: shard down → replica; overlay down → fall back to schedule-based open/closed (conservative) + alert; ranking/feature timeout → cached features or fall back to the heuristic ranker; circuit breakers to prevent cascades; a conservative "filter out when uncertain" default.
A DR story (index snapshots; rebuild from sources of truth).

Part 9 — Metric / Monitoring System

Design the observability layer that tracks search quality, reliability, latency, freshness, and business impact, including the instrumentation flow, the dashboards, and the alerts.

What This Part Should Cover

The instrumentation flow (structured event + per-stage metrics → time-series for alerting, → warehouse for offline analysis), segmented by city / platform / query class / dependency.
Concrete metrics in each category: reliability (error/timeout/zero-result rates, consumer lag, stale-doc count); latency (P50–P99 end-to-end and per stage); quality (CTR, conversion, reformulation rate, mean clicked position, offline NDCG/MRR); business (orders/GMV per search, supply-side exposure fairness); freshness (update-to-searchable lag, stock-change-to-filtered lag, % stale results).
Alerts tied to SLO breaches (P95 latency, error rate, zero-result spike, consumer/NRT lag, availability-stale rate, conversion drop), and shipping ranking changes behind A/B experiments measured on conversion/GMV.

What a Strong Answer Covers

Across all parts, a strong answer is held together by a few cross-cutting decisions rather than ten independent sub-answers:

One spine, stated explicitly : geo-partition everything (local problem) + split a static index from a live overlay store (freshness without index thrash) + rank in stages (rich models within the latency budget). The best answers return to this spine when justifying each Part.
Correctness of the deliverability gate : deliverability is a hard filter (point-in-polygon, made cheap via tiling), never a soft distance signal — and the candidate must never be served undeliverable or unavailable.
Numbers drive decisions : the sizing in Part 1 is actually used to justify in-RAM indexing, small ranking fan-out, and the overlay split — not left as decoration.
Latency budget is respected end-to-end : every expensive step (overlay reads, feature fetch, heavy ranker) is bounded, batched, parallelized, and has a fallback, so P95 stays under target.
Failure is the default, not the exception : the design prefers slightly-stale-but-available over an error, with circuit breakers, conservative filtering, and a heuristic-ranker fallback.

Follow-up Questions

Freshness vs. latency. Synchronously re-checking every dynamic feature maximizes freshness but adds tail latency. How does your static-index + live-overlay split resolve this, and where exactly do you draw the line on what is fetched live?
Lexical vs. semantic matching. Lexical (inverted index) is fast, cheap, and great for exact name/dish hits but misses synonyms and vague intent ("something spicy and cheap"). Where would you add semantic/vector retrieval, and would it replace or augment the lexical path?
Online vs. precomputed ranking. Precomputed scores are cheap but can't react to live ETA/surge or per-user context; online ranking is fresh but costly. How does multi-stage ranking let you have both, and which signals belong in which stage?
Dish-level search. If dish queries ("vegan burger") become the dominant traffic, how does the document model and ranking change versus a restaurant-centric index?

Design Uber Eats-style search function

Quick Overview