PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/ML System Design/Runway

Design a scalable video search system

Last updated: Jun 2, 2026

Quick Overview

This question evaluates expertise in multimodal retrieval and large-scale machine learning system design, focusing on model and embedding architectures, temporal aggregation, indexing, and operational pipelines for text-to-video and video-to-video search within the ML System Design domain.

  • hard
  • Runway
  • ML System Design
  • Machine Learning Engineer

Design a scalable video search system

Company: Runway

Role: Machine Learning Engineer

Category: ML System Design

Difficulty: hard

Interview Round: Technical Screen

Design an end-to-end video search system that supports text-to-video and video-to-video retrieval. Specify the model architecture (e.g., dual encoders for video frames/clips and text, temporal aggregation strategies, pretraining/fine-tuning objectives, negative sampling, handling long videos via clip sampling), the shared embedding space, and relevance scoring. For infrastructure, describe the offline/online feature extraction pipeline, ingestion, metadata enrichment, indexing in a vector store (e.g., HNSW/IVF, sharding and replication), storage layout for thumbnails/clips, query routing, latency/cost targets, caching, and scalability for billions of clips. Include monitoring, abuse/content safety, and evaluation via offline metrics and online A/B tests; discuss relevance feedback and incremental model updates.

Quick Answer: This question evaluates expertise in multimodal retrieval and large-scale machine learning system design, focusing on model and embedding architectures, temporal aggregation, indexing, and operational pipelines for text-to-video and video-to-video search within the ML System Design domain.

Runway logo
Runway
Aug 8, 2025, 12:00 AM
Machine Learning Engineer
Technical Screen
ML System Design
7
0

Design a Text-to-Video and Video-to-Video Search System

Context

You are tasked with designing an end-to-end multimodal retrieval system that supports both:

  • Text-to-video search (user enters a query, returns the most relevant video segments)
  • Video-to-video search (user provides a reference video, returns visually/semantically similar segments)

Assume a catalog that grows to billions of video clips/segments, with global users and strict latency, cost, and safety requirements.

Requirements

  1. Model architecture
    • Encoders for text and video frames/clips (e.g., dual encoders)
    • Temporal aggregation for video (frame-to-clip, clip-to-video)
    • Shared embedding space, similarity function, and scoring
    • Pretraining/fine-tuning objectives (contrastive, multi-task), negative sampling strategies
    • Handling long videos (clip sampling, segment-level indexing)
    • Optional cross-encoder re-ranking for top-K
  2. Infrastructure
    • Offline/online feature extraction pipelines
    • Ingestion and metadata enrichment (e.g., ASR, OCR, tags)
    • Indexing in a vector store (HNSW/IVF, PQ/OPQ, sharding, replication)
    • Storage layout for thumbnails and short preview clips
    • Query routing and aggregation across shards
    • Latency and cost targets; caching strategies
    • Scalability to billions of clips/segments
  3. Monitoring, safety, and evaluation
    • System and model monitoring (SLOs, drift, recall)
    • Abuse/content safety (NSFW, spam, PII)
    • Evaluation via offline metrics and online A/B tests
    • Relevance feedback and incremental model updates

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More Runway•More Machine Learning Engineer•Runway Machine Learning Engineer•Runway ML System Design•Machine Learning Engineer ML System Design
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.