Design a multimodal embedding service
Company: Adobe
Role: Software Engineer
Category: ML System Design
Difficulty: hard
Interview Round: Technical Screen
Design a system to compute embeddings for user‑uploaded files across modalities—documents, images, and videos—where each file size is at most x MB, and persist results to a database. Describe the ingestion API, validation and preprocessing (e.g., text chunking, image resizing, video frame sampling or clip extraction), model choices per modality, batching, GPU/accelerator scheduling, and concurrency controls. Explain how you will store embeddings and metadata (e.g., vector store vs. relational/columnar DB), support similarity search, deduplicate near‑identical content, handle retries and idempotency, and manage backfills when models are updated. Include monitoring, quality evaluation, cost controls, and privacy/security considerations.
Quick Answer: This question evaluates a candidate's competency in ML system design for building scalable, multi‑tenant, privacy‑sensitive multimodal embedding pipelines, covering ingestion and idempotency, modality-specific preprocessing, model selection and fusion, throughput engineering, storage and versioning, data hygiene, and operational concerns.