PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/ML System Design/OpenAI

Design a Text-to-Video Generation Service

Last updated: May 30, 2026

Quick Overview

This question evaluates proficiency in designing scalable, reliable ML-driven distributed systems, covering asynchronous job orchestration, scheduler design, GPU-based model inference, progress tracking, result storage and retrieval, and fault handling across complex workflows.

  • medium
  • OpenAI
  • ML System Design
  • Software Engineer

Design a Text-to-Video Generation Service

Company: OpenAI

Role: Software Engineer

Category: ML System Design

Difficulty: medium

Interview Round: Technical Screen

Design a large-scale text-to-video generation service similar to a modern generative video product. Users submit a text prompt and receive a generated video after an asynchronous processing workflow. The system should support job submission, scheduling, model inference on GPU workers, progress tracking, result storage, and video retrieval. Start with a high-level architecture, then deep dive into the scheduler. In particular, discuss how the system handles failure cases across the workflow, including scheduler failures, worker failures, retries, duplicate execution, partial outputs, storage failures, and capacity exhaustion.

Quick Answer: This question evaluates proficiency in designing scalable, reliable ML-driven distributed systems, covering asynchronous job orchestration, scheduler design, GPU-based model inference, progress tracking, result storage and retrieval, and fault handling across complex workflows.

Related Interview Questions

  • Design a Text-to-Video Generation System - OpenAI (hard)
  • Design a Real-Time Sensor Intelligence System - OpenAI (medium)
  • Mine Novel Images from Unlabeled Data - OpenAI (medium)
  • Design a GPU-Efficient Video Service - OpenAI (medium)
  • Design a RAG system with evaluation - OpenAI (medium)
OpenAI logo
OpenAI
May 22, 2026, 12:00 AM
Software Engineer
Technical Screen
ML System Design
4
0

Design a large-scale text-to-video generation service similar to a modern generative video product.

Users submit a text prompt and receive a generated video after an asynchronous processing workflow. The system should support job submission, scheduling, model inference on GPU workers, progress tracking, result storage, and video retrieval.

Start with a high-level architecture, then deep dive into the scheduler. In particular, discuss how the system handles failure cases across the workflow, including scheduler failures, worker failures, retries, duplicate execution, partial outputs, storage failures, and capacity exhaustion.

Solution

Show

Submit Your Answer

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More OpenAI•More Software Engineer•OpenAI Software Engineer•OpenAI ML System Design•Software Engineer ML System Design
PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.