Design a search autocomplete system

Q: Design a search autocomplete system

This system design question evaluates the ability to architect a large-scale search autocomplete service, testing knowledge of prefix data structures, distributed indexing, and latency-sensitive serving pipelines. It is commonly asked in senior engineering interviews to assess trade-off reasoning between freshness, read performance, and system complexity at scale.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Q: What difficulty level is this interview question?

This is a hard difficulty System Design question, commonly asked during Onsite rounds at Uber.

Q: What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Uber during technical interviews.

Question

Design a Search Autocomplete System

Design a planet-scale search autocomplete (type-ahead) service. As a user types into a search box, the service returns the top-K ranked query completions for the current prefix, updating on each keystroke. Think of the suggestion drop-down under a Google-style or Uber-style search bar.

Your design should cover the full system: the online serving path that answers a prefix lookup in single-digit milliseconds, and the offline / nearline pipelines that turn raw query logs into the ranked index the serving path reads. Be ready to discuss prefix lookup data structures, ranking and personalization, freshness, caching, sharding, the consistency model, and how you keep suggestions safe and private at scale.

Constraints & Assumptions

Use these as the working scale unless you choose to clarify them:

Users: hundreds of millions of daily active users; multi-region, active-active.
Throughput: up to ~1M+ requests/second globally (a request fires per keystroke), with regional peaks around 1M QPS.
Latency (service time, per region): p50 < 20 ms, p95 < 40 ms, p99 < 60 ms. Network/edge adds another ~10–20 ms end-to-end.
Result size: return $K \approx 5$ – $10$ suggestions; each suggestion payload ~50–100 B, full response ~0.5–1 KB.
Availability: 99.99% per region.
Freshness: trending updates visible within minutes; full reindex daily.
Read-heavy: writes (query logs) are large in volume but processed offline/nearline; the online path is overwhelmingly reads.

Clarifying Questions to Ask

A strong candidate scopes the problem before designing. Reasonable questions include:

Suggestion source: are we completing arbitrary past queries , or a curated catalog of entities (products, places, people)? Both?
Personalization: is per-user/cohort personalization in scope, and what privacy constraints (k-anonymity, opt-out, regional regulation) apply?
Languages & scripts: which locales must we support — Latin only, or CJK / RTL / IME input as well? Is typo tolerance required?
Ranking signal: is ranking purely global popularity, or do we have click/selection feedback (CTR/MRR) and freshness signals to incorporate?
Safety/admin: do we need blocklists, pinning, synonyms, and an emergency kill-switch for offensive or sensitive suggestions?
Mobile vs. web, prefix length: do we suggest from the first keystroke or wait for a minimum prefix length / stable IME composition?

What a Strong Answer Covers

The interviewer is looking for breadth across the full system plus depth on the read path. A strong answer addresses these dimensions (not necessarily in this order):

Requirements & scale: crisp functional/non-functional requirements and a back-of-the-envelope estimate of QPS, memory for the index, and network egress.
API surface: a clean Suggest API contract, plus Events (impressions/clicks/selections) and Admin (blocklist/pin/synonym/flags) APIs.
Prefix lookup structure: a justified choice among trie / FST / inverted index, with the memory-vs-latency trade-off and how top-K is retrieved cheaply.
Ranking & personalization: the signals (popularity, CTR, freshness, geo, personal affinity), how they combine, and how personalization is gated for privacy.
Freshness mechanism: snapshot + delta-overlay design, nearline aggregation, and a clear freshness SLO.
Caching: multi-tier (in-process / distributed / edge) caching with versioned keys and invalidation.
Sharding, replication & consistency: partitioning that survives hot prefixes, replication for availability, and a coherent (mostly eventual, strong-for-policy) consistency model.
Typo tolerance & multilingual support: edit-distance / fuzzy candidate generation and Unicode normalization, segmentation, RTL/IME handling.
Observability & SLOs: golden signals plus quality metrics (CTR, MRR/NDCG), freshness lag, and cache hit ratios.
Privacy, safety & abuse: k-anonymity, PII minimization, opt-out, and trend-manipulation/bot defenses.
Failure handling: graceful degradation (drop personalization, serve cached/global results) under cache, index-rollout, or node failures.

Follow-up Questions

Hot prefixes: a breaking-news event makes one prefix spike 100x in seconds. How does your sharding/caching/freshness path absorb it without violating the latency SLO?
Index rollout safety: a newly built daily snapshot has a ranking regression. How do you detect it pre-publish and roll back in production without dropping requests?
Personalization without leaking: how do you boost suggestions for a specific user while guaranteeing you never surface a query that fewer than $k$ distinct users have issued?
Typo tolerance cost: enabling edit-distance-1 fuzzy matching multiplies candidate fanout. How do you bound its latency and memory impact, and when do you turn it off?

Design a search autocomplete system

Quick Overview

Design a search autocomplete system

Design a Search Autocomplete System

Constraints & Assumptions

Clarifying Questions to Ask

What a Strong Answer Covers

Follow-up Questions

Submit Your Answer to Earn 20XP

Design a search autocomplete system

Quick Overview

Design a search autocomplete system

Design a Search Autocomplete System

Constraints & Assumptions

Clarifying Questions to Ask

What a Strong Answer Covers

Follow-up Questions

Submit Your Answer to Earn 20XP