System Design: Google-Scale Search Autocomplete
Goal
Design a planet-scale autocomplete service for a search box that suggests completions as users type. Assume hundreds of millions of daily active users, multi-region deployments, and peak QPS in the high hundreds of thousands to millions.
Requirements
-
Functional
-
Return top-K ranked suggestions for a given prefix as the user types (K ≈ 5–10).
-
Support personalization (when permitted), multi-lingual input, and typo tolerance.
-
Highlight matched prefix; return metadata (type, score, reason codes) for observability.
-
Support admin operations (blocklist, pin/unpin, synonyms) and A/B experiments.
-
Non-Functional
-
Latency: p50 < 20 ms, p95 < 40 ms, p99 < 60 ms (service time, per region).
-
Throughput: up to 1M+ QPS globally.
-
Availability: 99.99% per region; active-active multi-region.
-
Relevance: high CTR/MRR; guardrails for safety and privacy.
-
Freshness: trending updates visible within minutes; full reindex daily.
-
External APIs
-
Suggest API (online, low-latency).
-
Events API (impressions, clicks, selections).
-
Admin API (blocklist, pinning, synonyms, feature flags).
-
Data Model
-
Suggestion terms/entities, language/locale, popularity stats, blocklists, ranking features.
-
Prefix index structure (e.g., weighted FST) plus overlays for deltas.
-
Architecture
-
Online serving: edge → API → caches → prefix lookup engine → ranker → response.
-
Offline/nearline pipelines: logs → ETL → aggregation → model training → index build → publish snapshots and deltas.
-
Core Design Topics to Cover
-
Prefix lookup structures (trie, FST, inverted index) and trade-offs.
-
Ranking signals and personalization.
-
Typo tolerance and multi-lingual support.
-
Freshness and incremental updates.
-
Caching strategy.
-
Sharding and replication.
-
Consistency model.
-
Monitoring/SLOs.
-
Privacy and abuse/spam mitigation.
-
Capacity planning.