System Design: Multimodal File Ingestion and Embedding Service
Context
Design a backend service that accepts user-uploaded files (documents, images, videos), computes embeddings for search/retrieval, and stores results for querying. The service must be reliable, secure, and scalable. Assume a maximum file size of X MB (e.g., X = 512 MB for estimates) and bursty traffic patterns.
Requirements
Specify and justify the following:
-
API design
-
Endpoints for upload, status, and search.
-
Idempotency, authentication, and callbacks/webhooks.
-
Storage schema
-
Where raw files live (object storage) and how they’re organized.
-
Database schema for file metadata, chunk metadata, embeddings, jobs, and errors.
-
Chunking strategies per modality
-
Documents (PDF, DOCX, TXT), images, videos (frames + audio transcript).
-
Model choices and versioning
-
Text/image/video embedding models, dimensions, and upgrade strategy.
-
Asynchronous processing pipeline
-
Queues, workers, orchestration, and status tracking.
-
Idempotency and retries
-
Content hashing, idempotency keys, retry/backoff, exactly-once semantics where needed.
-
Failure handling
-
Dead-letter queues, partial failures, quarantining, and operator tooling.
-
Vector index selection and updates
-
Technology choice(s), filtering, partitioning, upsert/delete, and reindex strategy during model migrations.
-
Security and PII handling
-
Encryption, malware scanning, DLP/PII detection, access control, auditability, and deletion (e.g., GDPR).
-
Cost/performance trade-offs
-
Model size vs cost, batching, compression, storage lifecycle, and caching/deduplication.
-
Scalability
-
Throughput spikes and long-running video jobs (segmentation, checkpointing, autoscaling, priorities).
Provide a cohesive design with brief diagrams-as-text or bullet points as needed.