Design multimodal embedding service

Q: Design multimodal embedding service

This is a ML System Design interview question from Adobe for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

Question

System Design: Multimodal File Ingestion and Embedding Service

Context

Design a backend service that accepts user-uploaded files (documents, images, videos), computes embeddings for search/retrieval, and stores results for querying. The service must be reliable, secure, and scalable. Assume a maximum file size of X MB (e.g., X = 512 MB for estimates) and bursty traffic patterns.

Requirements

Specify and justify the following:

API design
- Endpoints for upload, status, and search.
- Idempotency, authentication, and callbacks/webhooks.
Storage schema
- Where raw files live (object storage) and how they’re organized.
- Database schema for file metadata, chunk metadata, embeddings, jobs, and errors.
Chunking strategies per modality
- Documents (PDF, DOCX, TXT), images, videos (frames + audio transcript).
Model choices and versioning
- Text/image/video embedding models, dimensions, and upgrade strategy.
Asynchronous processing pipeline
- Queues, workers, orchestration, and status tracking.
Idempotency and retries
- Content hashing, idempotency keys, retry/backoff, exactly-once semantics where needed.
Failure handling
- Dead-letter queues, partial failures, quarantining, and operator tooling.
Vector index selection and updates
- Technology choice(s), filtering, partitioning, upsert/delete, and reindex strategy during model migrations.
Security and PII handling
- Encryption, malware scanning, DLP/PII detection, access control, auditability, and deletion (e.g., GDPR).
Cost/performance trade-offs
- Model size vs cost, batching, compression, storage lifecycle, and caching/deduplication.
Scalability
- Throughput spikes and long-running video jobs (segmentation, checkpointing, autoscaling, priorities).

Provide a cohesive design with brief diagrams-as-text or bullet points as needed.

Design multimodal embedding service

System Design: Multimodal File Ingestion and Embedding Service

Context

Requirements

Solution (Locked)

Comments (0)