Search, Autocomplete And Restaurant Discovery

What's being tested

Uber-style restaurant search tests whether you can design a low-latency retrieval system that combines text search, prefix autocomplete, geospatial filtering, freshness, ranking, and operational safety. For a Software Engineer, the interviewer is probing architecture judgment: what should be precomputed, what should be evaluated at query time, how to shard and cache, and how to keep results correct when restaurants open, close, move, or change menus. Uber cares because search is on the critical path of marketplace conversion: stale or slow results directly hurt order volume, courier efficiency, and user trust. A strong answer balances p99 latency, index freshness, ranking quality, and system simplicity without drifting into deep ML modeling or product strategy.

Core knowledge

Autocomplete is usually served from a trie, compressed trie, DAWG, or finite-state transducer rather than scanning store names. Store top- $k$ candidate IDs per prefix node for fast lookup: query time is roughly $O(|prefix| + k)$ , while memory grows with total characters and stored suggestions.
Full-text search is built around an inverted index mapping tokens to posting lists of document IDs. Systems like Lucene, Elasticsearch, and Solr handle tokenization, stemming, typo tolerance, field boosts, and scoring. Classic relevance often starts with BM25, where term frequency, inverse document frequency, and field length normalization determine text relevance.
Geospatial search should avoid computing distance to every restaurant. Use geohash, S2, or H3 cells to map latitude/longitude into spatial buckets, retrieve nearby cells covering a radius, then apply exact Haversine distance filtering:
$d = 2r \arcsin \sqrt{\sin^2(\Delta\phi/2)+\cos\phi_1\cos\phi_2\sin^2(\Delta\lambda/2)}$
Candidate generation and ranking are separate stages. Candidate generation retrieves a few hundred or thousand plausible restaurants using text, geo, cuisine, and availability filters. Ranking then scores them using fast features such as text match, distance, ETA, rating, delivery fee, open status, and historical popularity.
Hard filters must be applied before ranking when correctness matters. A restaurant outside delivery range, closed, unavailable, or not serving the searched item should not appear just because it has a high relevance score. A common pattern is: retrieve broad candidates, apply eligibility filters, rank eligible candidates, then hydrate response details.
Index freshness is a first-class design concern. Static fields like restaurant name, cuisine, address, and menu descriptions can live in the search index. Volatile fields like open/closed status, inventory, courier-constrained availability, surge fees, and delivery ETA are often fetched from authoritative services at query time or stored in a fast side cache.
Sharding strategy depends on the access pattern. Text-heavy systems often shard by document ID or term; local marketplace search benefits from geo-aware sharding so most queries hit a small number of nearby partitions. Replication supports read throughput and availability, while scatter-gather fanout must be bounded to protect p99.
Caching is extremely valuable for autocomplete and popular local queries. Use Redis or in-process caches for normalized prefixes like "sus" in a city cell, and cache result IDs separately from hydrated restaurant details. Short TTLs reduce stale closed-store results; event-driven invalidation is useful for high-impact changes.
Query normalization improves recall: lowercase, trim whitespace, remove punctuation, handle accents, synonyms, common misspellings, and cuisine aliases such as "bbq" versus "barbecue". For autocomplete, preserve prefix semantics; for full search, tokenization and synonym expansion can be more aggressive.
Latency budgets should be explicit. For example: API gateway 5 ms, query parsing 5 ms, index lookup 30 ms, availability/ETA fanout 40 ms, ranking 10 ms, hydration 20 ms, leaving buffer for network variance. Designing for p99 < 200 ms requires bounded fanout, timeouts, fallbacks, and partial degradation.
Monitoring should cover both system health and result correctness. Track p50, p95, p99, timeout rate, cache hit rate, index lag, zero-result rate, stale-result reports, shard hot spots, and dependency failures. Search-specific alerts catch failures that generic CPU or memory metrics miss.
Failure handling should degrade gracefully. If personalization or ETA service times out, return text-and-geo-ranked restaurants with approximate delivery time rather than failing the request. If a shard is unavailable, return partial results with enough redundancy from replicas, but avoid silently showing ineligible restaurants.

Worked example

For Design Uber Eats-style search function, a strong candidate starts by clarifying scope: “Are we searching restaurants only, menu items too, or both? Do we need autocomplete? What are the expected QPS, latency target, geographic scale, and freshness requirements for open status and menus?” Then they declare assumptions, for example: 50M restaurants/items globally, local queries dominate, p99 under 200 ms, and open/closed status must be accurate within seconds.

The answer can be organized around four pillars: data model, indexing, query serving, and operability. The data model includes restaurant documents with name, cuisines, location, service area, menu item tokens, ratings, and stable IDs, while volatile attributes are kept out of the main index where possible. The indexing pillar uses a Lucene-style inverted index for text plus H3 or S2 cells for geo filtering, with replicas per region for low-latency reads.

The query-serving path is: normalize query, identify user location cell, retrieve candidates from nearby geo partitions and text postings, apply hard filters such as delivery radius and open status, rank candidates, hydrate restaurant cards, and return paginated results. A key tradeoff to flag is freshness versus latency: putting open status directly in the search index makes queries faster but risks stale results; checking an availability service at query time is more correct but adds fanout and tail latency. A good close is: “If I had more time, I’d discuss typo tolerance, multi-region failover, index rebuilds, and how to monitor zero-result and stale-result rates.”

A second angle

For Design a search autocomplete system, the same retrieval principles apply, but the latency and data-structure constraints are sharper. The user expects suggestions after each keystroke, often within 50 ms, so a trie/FST with precomputed top suggestions per prefix is more appropriate than a full search query for every character. Ranking can use locality and popularity, but it must be cheap: store per-city or per-cell top suggestions and rerank lightly using user location. Freshness matters less for static names, but closed or unavailable restaurants should be filtered before display or marked unavailable. The main design tension becomes memory versus relevance: storing top- $k$ suggestions at every prefix is fast but expensive, while computing top- $k$ dynamically saves memory but hurts latency.

Common pitfalls

Pitfall: Designing autocomplete as SELECT * FROM stores WHERE name LIKE 'prefix%'.

That may work for thousands of stores, but it fails at Uber scale due to full scans, poor prefix ranking, and weak latency under concurrent keystrokes. A better answer mentions trie/FST structures, prefix caches, top- $k$ precomputation, and city- or geo-scoped indexes.

Pitfall: Treating search as only a text-ranking problem.

Restaurant discovery is also constrained by distance, delivery radius, open status, menu availability, and marketplace state. A high-scoring text match is useless if the restaurant cannot deliver to the user, so eligibility filters and freshness checks must be part of the serving path.

Pitfall: Jumping into components without defining the query flow.

Saying “use Elasticsearch, Redis, and Kafka” is not a design. Interviewers expect the request lifecycle: parse and normalize, retrieve candidates, apply geo and availability filters, rank, hydrate, cache, observe, and degrade under dependency failures.

Connections

Interviewers often pivot from this topic into geospatial indexing, distributed caching, search ranking, event-driven freshness, or rate limiting for high-QPS APIs. They may also ask for a smaller coding variant, such as implementing store autocomplete with a trie or sorted array plus binary search.

What's being tested

Core knowledge

Worked example

A second angle

Common pitfalls

Connections

Further reading

Featured in interview prep guides

Practice questions

Related concepts