How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

What difficulty level is this interview question?

This is a hard difficulty ML System Design question, commonly asked during Technical Screen rounds at Runway.

What role is this question designed for?

This question is commonly asked for Machine Learning Engineer candidates at Runway during technical interviews.

Design a scalable video search system | Runway Interview Question

Quick Overview

This question evaluates expertise in multimodal retrieval and large-scale machine learning system design, focusing on model and embedding architectures, temporal aggregation, indexing, and operational pipelines for text-to-video and video-to-video search within the ML System Design domain.

Design a Text-to-Video and Video-to-Video Search System

Context

You are tasked with designing an end-to-end multimodal retrieval system that supports both:

Text-to-video search (user enters a query, returns the most relevant video segments)
Video-to-video search (user provides a reference video, returns visually/semantically similar segments)

Assume a catalog that grows to billions of video clips/segments, with global users and strict latency, cost, and safety requirements.

Requirements

Model architecture
- Encoders for text and video frames/clips (e.g., dual encoders)
- Temporal aggregation for video (frame-to-clip, clip-to-video)
- Shared embedding space, similarity function, and scoring
- Pretraining/fine-tuning objectives (contrastive, multi-task), negative sampling strategies
- Handling long videos (clip sampling, segment-level indexing)
- Optional cross-encoder re-ranking for top-K
Infrastructure
- Offline/online feature extraction pipelines
- Ingestion and metadata enrichment (e.g., ASR, OCR, tags)
- Indexing in a vector store (HNSW/IVF, PQ/OPQ, sharding, replication)
- Storage layout for thumbnails and short preview clips
- Query routing and aggregation across shards
- Latency and cost targets; caching strategies
- Scalability to billions of clips/segments
Monitoring, safety, and evaluation
- System and model monitoring (SLOs, drift, recall)
- Abuse/content safety (NSFW, spam, PII)
- Evaluation via offline metrics and online A/B tests
- Relevance feedback and incremental model updates

Quick Overview

Context

You are tasked with designing an end-to-end multimodal retrieval system that supports both:

Text-to-video search (user enters a query, returns the most relevant video segments)

Video-to-video search (user provides a reference video, returns visually/semantically similar segments)

Assume a catalog that grows to billions of video clips/segments, with global users and strict latency, cost, and safety requirements.

Requirements

Model architecture

Encoders for text and video frames/clips (e.g., dual encoders)
Temporal aggregation for video (frame-to-clip, clip-to-video)
Shared embedding space, similarity function, and scoring
Pretraining/fine-tuning objectives (contrastive, multi-task), negative sampling strategies
Handling long videos (clip sampling, segment-level indexing)
Optional cross-encoder re-ranking for top-K

Infrastructure

Offline/online feature extraction pipelines
Ingestion and metadata enrichment (e.g., ASR, OCR, tags)
Indexing in a vector store (HNSW/IVF, PQ/OPQ, sharding, replication)
Storage layout for thumbnails and short preview clips
Query routing and aggregation across shards
Latency and cost targets; caching strategies
Scalability to billions of clips/segments

Monitoring, safety, and evaluation

System and model monitoring (SLOs, drift, recall)
Abuse/content safety (NSFW, spam, PII)
Evaluation via offline metrics and online A/B tests
Relevance feedback and incremental model updates

Design a scalable video search system

Quick Overview

Design a Text-to-Video and Video-to-Video Search System

Context

Requirements

Solution

Submit Your Answer to Earn 20XP

Design a scalable video search system

Quick Overview

Design a Text-to-Video and Video-to-Video Search System

Context

Requirements

Solution

Submit Your Answer to Earn 20XP