Design a Search Autocomplete System
Design a planet-scale search autocomplete (type-ahead) service. As a user types into a search box, the service returns the top-K ranked query completions for the current prefix, updating on each keystroke. Think of the suggestion drop-down under a Google-style or Uber-style search bar.
Your design should cover the full system: the online serving path that answers a prefix lookup in single-digit milliseconds, and the offline / nearline pipelines that turn raw query logs into the ranked index the serving path reads. Be ready to discuss prefix lookup data structures, ranking and personalization, freshness, caching, sharding, the consistency model, and how you keep suggestions safe and private at scale.
Constraints & Assumptions
Use these as the working scale unless you choose to clarify them:
-
Users:
hundreds of millions of daily active users; multi-region, active-active.
-
Throughput:
up to ~1M+ requests/second globally (a request fires per keystroke), with regional peaks around 1M QPS.
-
Latency (service time, per region):
p50 < 20 ms, p95 < 40 ms, p99 < 60 ms. Network/edge adds another ~10–20 ms end-to-end.
-
Result size:
return
K≈5
–
10
suggestions; each suggestion payload ~50–100 B, full response ~0.5–1 KB.
-
Availability:
99.99% per region.
-
Freshness:
trending updates visible within minutes; full reindex daily.
-
Read-heavy:
writes (query logs) are large in volume but processed offline/nearline; the online path is overwhelmingly reads.
Clarifying Questions to Ask
A strong candidate scopes the problem before designing. Reasonable questions include:
-
Suggestion source:
are we completing arbitrary past
queries
, or a curated catalog of
entities
(products, places, people)? Both?
-
Personalization:
is per-user/cohort personalization in scope, and what privacy constraints (k-anonymity, opt-out, regional regulation) apply?
-
Languages & scripts:
which locales must we support — Latin only, or CJK / RTL / IME input as well? Is typo tolerance required?
-
Ranking signal:
is ranking purely global popularity, or do we have click/selection feedback (CTR/MRR) and freshness signals to incorporate?
-
Safety/admin:
do we need blocklists, pinning, synonyms, and an emergency kill-switch for offensive or sensitive suggestions?
-
Mobile vs. web, prefix length:
do we suggest from the first keystroke or wait for a minimum prefix length / stable IME composition?
What a Strong Answer Covers
The interviewer is looking for breadth across the full system plus depth on the read path. A strong answer addresses these dimensions (not necessarily in this order):
-
Requirements & scale:
crisp functional/non-functional requirements and a back-of-the-envelope estimate of QPS, memory for the index, and network egress.
-
API surface:
a clean Suggest API contract, plus Events (impressions/clicks/selections) and Admin (blocklist/pin/synonym/flags) APIs.
-
Prefix lookup structure:
a justified choice among trie / FST / inverted index, with the memory-vs-latency trade-off and how top-K is retrieved cheaply.
-
Ranking & personalization:
the signals (popularity, CTR, freshness, geo, personal affinity), how they combine, and how personalization is gated for privacy.
-
Freshness mechanism:
snapshot + delta-overlay design, nearline aggregation, and a clear freshness SLO.
-
Caching:
multi-tier (in-process / distributed / edge) caching with versioned keys and invalidation.
-
Sharding, replication & consistency:
partitioning that survives hot prefixes, replication for availability, and a coherent (mostly eventual, strong-for-policy) consistency model.
-
Typo tolerance & multilingual support:
edit-distance / fuzzy candidate generation and Unicode normalization, segmentation, RTL/IME handling.
-
Observability & SLOs:
golden signals plus quality metrics (CTR, MRR/NDCG), freshness lag, and cache hit ratios.
-
Privacy, safety & abuse:
k-anonymity, PII minimization, opt-out, and trend-manipulation/bot defenses.
-
Failure handling:
graceful degradation (drop personalization, serve cached/global results) under cache, index-rollout, or node failures.
Follow-up Questions
-
Hot prefixes:
a breaking-news event makes one prefix spike 100x in seconds. How does your sharding/caching/freshness path absorb it without violating the latency SLO?
-
Index rollout safety:
a newly built daily snapshot has a ranking regression. How do you detect it pre-publish and roll back in production without dropping requests?
-
Personalization without leaking:
how do you boost suggestions for a specific user while guaranteeing you never surface a query that fewer than
k
distinct users have issued?
-
Typo tolerance cost:
enabling edit-distance-1 fuzzy matching multiplies candidate fanout. How do you bound its latency and memory impact, and when do you turn it off?