Adobe Creative Cloud asset search, indexing, autocomplete, and sharding

What's being tested

Interviewers are probing your ability to design a scalable, low-latency search and autocomplete service for rich Creative Cloud assets: data modeling for text + metadata, indexing pipelines, query-time routing, and sharding strategies that meet throughput and availability SLAs. They want concrete choices (index format, shard sizing, replication, refresh strategy), tradeoffs (latency vs freshness, fan-out vs routing), and operational considerations (rebalancing, backfills, monitoring).

Core knowledge

Inverted index: the canonical structure for text search (term → posting list). Use tokenization, stemming, and analyzers to control vocabulary size and match semantics for asset names, tags, and captions.
Edge n-grams vs trie/completion: for prefix autocomplete, edge n-gram analyzers or a completion suggester (trie-like) provide O(k) prefix lookup; n-grams increase index size (factor ≈ average token length).
Vector vs lexical search: lexical (inverted index) is fast and precise for names/tags; vector embeddings (ANN indexes like HNSW) handle semantic similarity for captions; combine with hybrid queries and reranking.
Near-real-time indexing: tune refresh interval (e.g., 1s–5s) for ingestion latency vs query cost; use batch bulk APIs to improve throughput ( $\text{throughput}\propto\text{batch\_size}$ until memory/network bound).
Shard sizing & count: target shard sizes ~30–50GB each for Elasticsearch/Lucene workloads; choose shard count to keep per-shard CPU and memory reasonable and support expected QPS ( $\text{QPS per shard}\le$ node capacity).
Routing & fan-out: default search fans out to all shards; use custom routing (tenant or tag-based) to reduce fan-out when possible, avoiding hot shards and reducing p99 latency.
Replication and consistency: use replication factor (e.g., RF=1 or 2) for availability; accept eventual consistency for search results, rely on sequence numbers or optimistic concurrency for idempotent updates.
Index lifecycle & zero-downtime reindex: use index aliases for swapping new indexes after reindexing; rolling reindex or blue/green approach to avoid query downtime.
Caching & debounce: client-side debouncing (200–300ms) plus server-side caching (popular prefixes, Redis) and per-node query cache to reduce load and latency.
Backfill and idempotency: ingestion should be idempotent (use asset ID as document id or include versioning) so retries and backfills don't create duplicates.
Monitoring & SLOs: track p50/p95/p99 query latency, indexing lag, merge time, GC, heap usage, and shard rebalancing events; set alerts on rising query tail-latency or recovery rates.
Failure modes & rebalancing costs: shard move cost is proportional to shard size; large numbers of small shards increase coordination overhead, while few large shards increase recovery time.

Worked example — designing autocomplete for Creative Cloud assets

Start by clarifying: expected QPS for keystrokes, acceptable autocomplete latency (e.g., p99 ≤ 100ms), freshness requirements (near-instant vs minutes), and whether suggestions are global or per-user/tenant. Organize the answer into three pillars: (1) fast prefix lookup (implementation choices: edge n-gram index vs completion suggester vs Redis hot-table), (2) data pipeline and freshness (event stream from asset metadata change events, bulk backfill path, refresh tuning), and (3) caching/personalization (client debounce + server cache + per-user recent assets). A key tradeoff is index size vs lookup speed: edge n-grams make queries simple but bloat index; a separate completion suggester or trie reduces index but complicates updates and memory. You should flag operational concerns: handling per-keystroke QPS spikes (use request coalescing, rate-limit, tiered caching), and hot prefixes (use frequency-based sharding or dedicated caches). Close by saying if given more time you'd prototype latency under realistic QPS with representative asset names, add rate-limiting strategies, and define concrete SLOs and load-test scenarios.

A second angle — scaling full-text asset search with sharded indices

If the interviewer pivots to full-text search (captions, OCR, tags), emphasize ranking and reranking: front-line fan-out search over sharded inverted indices, then CPU-heavy relevance scoring and a lightweight reranker (signals from usage) on top. Here routing often must remain global because any asset could match; instead focus on optimizing fan-out (filter early with metadata, use shard-level doc frequency pruning) and shard placement (cold/warm nodes). Tradeoffs shift: freshness for captions can tolerate longer refresh intervals, allowing heavier index merging and smaller refresh frequency to reduce CPU pressure.

Common pitfalls

Pitfall: assuming a single global index with unlimited scalability — this ignores shard sizing and recovery-cost constraints; propose shard sizing and sharding strategy upfront.

Pitfall: focusing only on algorithmic latency and skipping client-side controls — forgetting debounce, cancellation, and caching leads to network storms on keystroke-heavy autocomplete.

Pitfall: overusing complex personalization at query time — expensive per-query models increase p99; state clearly when offline precompute or lightweight online signals suffice.

Connections

This area commonly connects to CDN and edge caching (for serving thumbnails and metadata), stream processing (ingest via Kafka or CDC), and embedding + ANN search for semantic retrieval. Interviewers might pivot to data pipelines or ML ranking; be ready to discuss integration boundaries, not re-designing those systems.