Design file-embedding storage system

Q: Design file-embedding storage system

This question evaluates understanding of multimodal embedding pipelines, covering ingestion, preprocessing (OCR, transcripts, frame extraction), model selection for text/image/video, scalable inference and GPU/CPU utilization, storage schema for raw assets, metadata and vectors, retrieval and cross-modal search in an ML System Design context.

Q: How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

Question

System Design: Multimodal Embedding Service for User Uploads

Context

You are designing a backend service that, for each user-uploaded asset, generates vector embeddings and stores them for search and retrieval.

Supported asset types: documents (PDF/DOCX/TXT), images (PNG/JPEG), and videos (MP4).
Maximum upload size per asset: x MB (configurable; assume x is a limit enforced by the API).
The system must cover: ingestion, preprocessing, model selection, scalability, storage schema, and retrieval.
Assume multi-tenant usage and that asynchronous processing is acceptable (embeddings ready within seconds to minutes).

Requirements

Ingestion: Validate and accept uploads, handle security checks, and persist raw assets.
Preprocessing: Extract text from documents, handle OCR for scanned pages, handle frames/transcripts for videos, and normalize images/videos.
Model Selection: Choose models for text, image, and video embeddings (and speech-to-text where needed); justify trade-offs.
Scalability: Design for horizontal scale, efficient GPU/CPU utilization, and cost control; ensure observability and fault tolerance.
Storage Schema: Define where raw assets, metadata, and vectors live, including partitioning and indexing choices.
Retrieval: Support text, image, or video queries that retrieve relevant documents/images/videos; include cross-modal search.

Design file-embedding storage system

Overview

System Design: Multimodal Embedding Service for User Uploads

Context

Requirements

Comments (0)