Adobe Search Indexing And Autocomplete

What's being tested

Candidates must show practical mastery of building low-latency autocomplete and search indexing components: choosing the right data structures, indexing strategy, update/refresh semantics, and tradeoffs between memory, CPU, and latency under realistic scale. Interviewers probe for correct algorithmic complexity, sharding/caching designs to meet p99 latency targets, and safe incremental-update patterns that avoid stale or inconsistent suggestions.

Core knowledge

Inverted index: primary structure for full-text search mapping token → posting list; good for scoring/ranking but not ideal for prefix-autocomplete unless you index edge n-grams or use specialized suggesters.
Trie / Prefix tree: O(L) lookup for a prefix of length L, supports lexicographic top-k if nodes keep top-k aggregates; memory grows with character branching—compress using compressed trie or FST.
FST (Finite State Transducer): compact, disk-friendly automaton used by Lucene for memory-efficient prefix/suffix lookup and weighted suggestions; lookup is O(L) and storage can be an order of magnitude smaller than naive tries.
Edge n-gram vs tokenization: edge n-gram generates prefixes at index time (e.g., "adobe" → "a","ad","ado"...), increasing index size by ~Σ(length) but makes queries simple term lookups; on-the-fly prefix scanning keeps index smaller but needs specialized structures.
Completion index vs inverted index: maintain a separate completion index (trie/FST) optimized for latency and top-k, while using inverted index for full-text ranking and fallback results.
Real-time updates / refresh: choose between batched refresh (lower write amplification, stale suggestions) and near-real-time per-document update (higher CPU/memory); tune refresh interval (e.g., 1s–5s) and use background merges to amortize cost.
Sharding & routing: shard by hash (balanced load) or by prefix (localized hot-prefixes) — hashing simplifies writes; prefix routing can reduce query fanout but risks load skew on hot prefixes. Total QPS ~ shards × qps_per_shard; plan for hot-shard mitigation.
Caching & TTLs: cache top suggestions per prefix in Redis/in-process LRU for sub-ms hits; use short TTLs to avoid strong staleness; preheat caches for known peak prefixes.
Fuzzy / typo-tolerant autocomplete: use edit-distance automata, BK-trees, or fuzzy FSTs; these increase CPU and memory complexity (exponential in edit radius) and usually limited to small radii (1–2).
Memory & compression: techniques like front-coding, delta-encoded posting lists, and packed ints reduce memory; an FST often reduces storage by sharing common suffixes/prefixes dramatically.
Latency budgets & SLOs: aim for strict p99 (e.g., <50ms) for autocomplete; design for tail latency: prioritize single-shard, in-memory lookups, avoid blocking I/O during queries.
Consistency model: eventual consistency is acceptable for suggestions; for strong consistency (e.g., recent admin removals), add tombstones or immediate invalidation hooks for caches.

Worked example — "Design an autocomplete service that returns top-k suggestions with sub-50ms `p99` latency and supports near-real-time updates"

First 30s: clarify QPS, dataset size (N), average suggestion length, update rate, and SLOs for p50/p99 and freshness. State assumptions: N≈50M phrases, QPS≈5k, updates ≈1000/day, and p99 target 50ms.

Skeleton answer pillars: (1) offline build + compact runtime index (FST) for memory-efficient prefix lookup; (2) write path with a short refresh window (1–2s) that applies new entries to an in-memory delta trie; (3) query path that checks the in-memory delta, then the persisted FST, merging top-k by weight; (4) sharding by hash to balance QPS with per-shard in-memory caches; (5) engineering for tail-latency (thread pools, local caches).

Design decision to flag: using an FST + in-memory delta trades slightly stale but cheap main index for immediate visibility via deltas — costs extra merge complexity and memory but keeps queries single-shard and in-memory. Also call out fallback: if top-k source exhausted, fallback to inverted-index search with scoring.

Close: say how to validate (load tests targeting p99, memprofs showing FST size), and add "if more time" items: implement typo-tolerance via edit-distance automata, and design automated index compaction/merge policies.

A second angle — "Scale autocomplete to support millions of entries and tight memory budget on each node"

Same core concept but different constraints: emphasize memory-first choices and approximate structures. Use a compact FST or a disk-backed compressed trie; store only top-k weights per node to prune size. Use prefix hashing + bloom filters to quickly reject misses (O(1) checks) and avoid expensive disk seeks. For heavy-write environments, buffer updates into a small in-memory delta and periodically merge offline to the compact store to avoid rebuilding large compressed structures frequently. Call out tradeoffs: aggressive compression reduces RAM but may increase p99 due to decompression/IO; explain fallback to a remote read for cold prefixes and prewarming for hot prefixes.

Common pitfalls

Pitfall: Treating the inverted index as a drop-in autocomplete solution — you may hit high-latency fanout and heavy aggregation costs; a specialized completion index or precomputed top-k per-prefix is often required.

Pitfall: Ignoring hot-prefix skew — shard-by-hash without hot-key mitigation leads to overloaded nodes; propose per-prefix throttling, replica placement, or hot-key caches.

Pitfall: Over-optimizing for average latency — interviewers expect tail-latency reasoning (thread pools, non-blocking IO, bounded queues) and instrumentation plans for p99 measurement, not just average-case numbers.

Connections

Interviewers will often pivot to adjacent areas like query ranking (how suggestions are scored and re-ranked server-side) and logging/telemetry (how to measure suggestion CTR, latency percentiles, and freshness). They may also ask about index rebuild/upgrade and safer deployment strategies (canary merges, versioned indices).

What's being tested

Core knowledge

Worked example — "Design an autocomplete service that returns top-k suggestions with sub-50ms `p99` latency and supports near-real-time updates"

A second angle — "Scale autocomplete to support millions of entries and tight memory budget on each node"

Common pitfalls

Connections

Further reading

Related concepts

What's being tested

Core knowledge

Worked example — "Design an autocomplete service that returns top-k suggestions with sub-50ms p99 latency and supports near-real-time updates"

A second angle — "Scale autocomplete to support millions of entries and tight memory budget on each node"

Common pitfalls

Connections

Further reading

Related concepts

Worked example — "Design an autocomplete service that returns top-k suggestions with sub-50ms `p99` latency and supports near-real-time updates"