Adobe Document Cloud Search Indexing And Autocomplete

What's being tested

Candidates must demonstrate end-to-end systems design and implementation judgment for a large-scale document search and autocomplete system: how to build a robust indexing pipeline, choose index data structures, meet latency/throughput SLAs, and handle updates/deletes and failure modes. Interviewers probe the candidate's ability to trade off real-time indexing vs. cost, pick appropriate search structures (memory vs disk), reason about sharding/replication, and design low-latency autocomplete that tolerates typos and scale.

Core knowledge

Inverted index: core structure mapping terms → posting lists (docID, positions, term-frequency). Use compressed posting formats (Gamma/VarInt, PForDelta) for disk efficiency and skip lists for fast seeks.
Tokenization & analyzers: tokenizers, stopword removal, stemming (Porter), and n-gram/edge n-gram analyzers shape what queries match; mismatched analyzer between index/query causes silent failures.
Finite-state transducer (FST) and trie: FSTs (e.g., Lucene FST) are compact, memory-efficient structures for prefix autocomplete; store outputs for frequencies or docIDs for fast prefix lookup.
Fuzzy matching: Levenshtein automata or BK-trees allow typo tolerance; Levenshtein-based automata integrated in Lucene provide bounded-edit-distance queries at predictable cost.
Near-real-time (NRT) indexing: implement with a fast in-memory segment + periodic refresh to make docs searchable; tune refresh interval to trade indexing latency vs query throughput and merge overhead.
Sharding & replication: shard by docID or tenant to scale writes/reads; replicas provide high availability and read scaling; plan shard count for expected growth — re-sharding is expensive.
Index merging & compaction: merges reduce segment count but are I/O heavy; control merge policies to avoid write stalls; use background merge threads and backpressure on ingestion.
Bulk ingestion & idempotency: batch documents and use idempotency keys to make retries safe; stream input via Kafka or change-data-capture, store raw payloads in S3 for reprocessing.
Update/delete semantics: updates often implemented as delete + reindex with versioning/tombstones; support versioned writes and periodic GC of tombstones to reclaim space.
Autocomplete ranking signals: precompute popularity counts, combine prefix matches with recency, personalization proxies, and context filters; keep suggestion data in an in-memory store (hot) and fallback to index (cold).
Capacity planning & SLAs: for target QPS and p99 latency, budget CPU, memory (FSTs often memory-resident), disk IOPS, and network; example: 10k QPS with p99 < 50ms may need several dozen query-serving nodes depending on index size.
Monitoring & correctness: monitor indexing-lag, refresh-time, merge-time, heap-pct, and search error rates; build diff tests to detect tokenization mismatches or lost updates.

Worked example — "Design a scalable indexing pipeline for Document Cloud that supports near-real-time search and updates"

First 30s: clarify SLAs (indexing latency target — e.g., <5s; query p99 — e.g., <100ms), scale (docs/day, average doc size), multi-tenancy and ACL requirements. Skeleton answer pillars: (1) ingestion layer: Kafka topics with producers that emit raw payloads and idempotency keys, (2) parsing layer: scalable parser workers (PDF → text, OCR fallback) that emit normalized tokens and metadata, (3) indexing layer: write to distributed Lucene-based clusters with sharding and replicas, using micro-batches and a tuned refresh interval for NRT, (4) maintenance: merges, tombstone GC, and reindex jobs. One tradeoff to flag: decreasing refresh interval lowers visibility latency but increases segment churn and CPU; choose micro-batch size and refresh frequency per tenant SLAs. Close by noting short-term priorities (implement idempotent bulk API, monitor indexing lag) and long-term work (cold storage reindexing, cross-cluster replication) if more time were available.

A second angle — "Design an autocomplete service for document search supporting typo-tolerance and personalization"

Autocomplete shifts priorities: ultra-low latency (single-digit ms) and memory-efficient data structures. Build a hot path with an in-memory FST per shard for edge n-grams storing suggestion scores, refreshed asynchronously from the main index every few seconds. For typo-tolerance, precompute popular fuzzy variants or use Levenshtein automata limited to edit distance 1 for top prefixes; heavy fuzzy queries must be rate-limited or routed to a more expensive backend. Personalization applies signals (user history, document access) to re-rank suggestions at query time; compute lightweight features in-memory and apply reranking within the latency budget. The framing differences: autocomplete favors precomputation and memory footprint; full-text search tolerates higher per-query cost and greater disk I/O.

Common pitfalls

Pitfall: Assuming index and query analyzers are interchangeable.
If you don't explicitly align tokenization between index-time and query-time, results will be inconsistent (e.g., n-gram at index but whitespace on query). Always document analyzer contracts and include round-trip tests.

Pitfall: Designing for average latency only.
Focusing on mean latency hides tail problems; interviewers expect p95/p99 reasoning and how you'll prevent head-of-line blocking during merges or GC pauses (e.g., circuit-breakers, backpressure).

Pitfall: Overfocusing on ML ranking for autocomplete.
Spending the interview on model architecture instead of core infra (FST, refresh cadence, memory budget) misses the SWE scope; state the ML signals and where they plug in, but prioritize operational constraints.

Connections

Interviewers may pivot to adjacent topics like distributed consensus for cross-region index replication, caching strategies (cold vs hot shards, CDN for big-result sets), or observability (tracing, request sampling) to validate tail latency and correctness.