PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/System Design/Uber

Design a search autocomplete system

Last updated: Jun 25, 2026

Quick Overview

This system design question evaluates the ability to architect a large-scale search autocomplete service, testing knowledge of prefix data structures, distributed indexing, and latency-sensitive serving pipelines. It is commonly asked in senior engineering interviews to assess trade-off reasoning between freshness, read performance, and system complexity at scale.

  • hard
  • Uber
  • System Design
  • Software Engineer

Design a search autocomplete system

Company: Uber

Role: Software Engineer

Category: System Design

Difficulty: hard

Interview Round: Onsite

Design a Google-scale search autocomplete service. Define functional and non-functional requirements (latency, throughput, availability, relevance), external APIs, data model, and overall architecture across online serving and offline pipelines. Cover prefix lookup structures (e.g., trie, FST, or inverted index), ranking signals and personalization, typo tolerance and multi-lingual support, freshness and incremental updates, caching strategy, sharding and replication, consistency model, monitoring/SLOs, privacy and abuse/spam mitigation, and capacity planning.

Quick Answer: This system design question evaluates the ability to architect a large-scale search autocomplete service, testing knowledge of prefix data structures, distributed indexing, and latency-sensitive serving pipelines. It is commonly asked in senior engineering interviews to assess trade-off reasoning between freshness, read performance, and system complexity at scale.

Related Interview Questions

  • Design a Ride-Sharing System (Uber-style Core Platform) - Uber
  • Design a Food-Delivery Backend (Uber Eats-style) - Uber (medium)
  • Design a Real-Time Chat System - Uber (medium)
  • Design a Distributed Logging System - Uber (medium)
  • Design a Stock Trading Platform - Uber (medium)
|Home/System Design/Uber

Design a search autocomplete system

Uber logo
Uber
Jul 15, 2025, 12:00 AM
hardSoftware EngineerOnsiteSystem Design
13
0

Design a Search Autocomplete System

Design a planet-scale search autocomplete (type-ahead) service. As a user types into a search box, the service returns the top-K ranked query completions for the current prefix, updating on each keystroke. Think of the suggestion drop-down under a Google-style or Uber-style search bar.

Your design should cover the full system: the online serving path that answers a prefix lookup in single-digit milliseconds, and the offline / nearline pipelines that turn raw query logs into the ranked index the serving path reads. Be ready to discuss prefix lookup data structures, ranking and personalization, freshness, caching, sharding, the consistency model, and how you keep suggestions safe and private at scale.

Constraints & Assumptions

Use these as the working scale unless you choose to clarify them:

  • Users: hundreds of millions of daily active users; multi-region, active-active.
  • Throughput: up to ~1M+ requests/second globally (a request fires per keystroke), with regional peaks around 1M QPS.
  • Latency (service time, per region): p50 < 20 ms, p95 < 40 ms, p99 < 60 ms. Network/edge adds another ~10–20 ms end-to-end.
  • Result size: return K≈5K \approx 5K≈5 – 101010 suggestions; each suggestion payload ~50–100 B, full response ~0.5–1 KB.
  • Availability: 99.99% per region.
  • Freshness: trending updates visible within minutes; full reindex daily.
  • Read-heavy: writes (query logs) are large in volume but processed offline/nearline; the online path is overwhelmingly reads.

Clarifying Questions to Ask

A strong candidate scopes the problem before designing. Reasonable questions include:

  1. Suggestion source: are we completing arbitrary past queries , or a curated catalog of entities (products, places, people)? Both?
  2. Personalization: is per-user/cohort personalization in scope, and what privacy constraints (k-anonymity, opt-out, regional regulation) apply?
  3. Languages & scripts: which locales must we support — Latin only, or CJK / RTL / IME input as well? Is typo tolerance required?
  4. Ranking signal: is ranking purely global popularity, or do we have click/selection feedback (CTR/MRR) and freshness signals to incorporate?
  5. Safety/admin: do we need blocklists, pinning, synonyms, and an emergency kill-switch for offensive or sensitive suggestions?
  6. Mobile vs. web, prefix length: do we suggest from the first keystroke or wait for a minimum prefix length / stable IME composition?

What a Strong Answer Covers

The interviewer is looking for breadth across the full system plus depth on the read path. A strong answer addresses these dimensions (not necessarily in this order):

  • Requirements & scale: crisp functional/non-functional requirements and a back-of-the-envelope estimate of QPS, memory for the index, and network egress.
  • API surface: a clean Suggest API contract, plus Events (impressions/clicks/selections) and Admin (blocklist/pin/synonym/flags) APIs.
  • Prefix lookup structure: a justified choice among trie / FST / inverted index, with the memory-vs-latency trade-off and how top-K is retrieved cheaply.
  • Ranking & personalization: the signals (popularity, CTR, freshness, geo, personal affinity), how they combine, and how personalization is gated for privacy.
  • Freshness mechanism: snapshot + delta-overlay design, nearline aggregation, and a clear freshness SLO.
  • Caching: multi-tier (in-process / distributed / edge) caching with versioned keys and invalidation.
  • Sharding, replication & consistency: partitioning that survives hot prefixes, replication for availability, and a coherent (mostly eventual, strong-for-policy) consistency model.
  • Typo tolerance & multilingual support: edit-distance / fuzzy candidate generation and Unicode normalization, segmentation, RTL/IME handling.
  • Observability & SLOs: golden signals plus quality metrics (CTR, MRR/NDCG), freshness lag, and cache hit ratios.
  • Privacy, safety & abuse: k-anonymity, PII minimization, opt-out, and trend-manipulation/bot defenses.
  • Failure handling: graceful degradation (drop personalization, serve cached/global results) under cache, index-rollout, or node failures.

Follow-up Questions

  1. Hot prefixes: a breaking-news event makes one prefix spike 100x in seconds. How does your sharding/caching/freshness path absorb it without violating the latency SLO?
  2. Index rollout safety: a newly built daily snapshot has a ranking regression. How do you detect it pre-publish and roll back in production without dropping requests?
  3. Personalization without leaking: how do you boost suggestions for a specific user while guaranteeing you never surface a query that fewer than kkk distinct users have issued?
  4. Typo tolerance cost: enabling edit-distance-1 fuzzy matching multiplies candidate fanout. How do you bound its latency and memory impact, and when do you turn it off?

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Uber•More Software Engineer•Uber Software Engineer•Uber System Design•Software Engineer System Design

Your design canvas — auto-saved

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.