PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/ML System Design/OpenAI

Design a GPU-Efficient Video Service

Last updated: Apr 27, 2026

Quick Overview

This question evaluates competency in designing GPU-constrained, production-grade ML serving platforms, emphasizing resource management, job scheduling and prioritization, admission control, durable storage, failure handling, and observability within distributed systems.

  • medium
  • OpenAI
  • ML System Design
  • Software Engineer

Design a GPU-Efficient Video Service

Company: OpenAI

Role: Software Engineer

Category: ML System Design

Difficulty: medium

Interview Round: Technical Screen

Design a text-to-video generation platform similar to a modern generative video product. Treat the actual model inference on GPUs as a black box: a job enters a GPU worker and eventually produces a video. Focus on the serving platform rather than model internals. The main requirements are: - Users submit prompts and generation parameters, receive a job ID, and later fetch the result. - GPU capacity is fixed or slow to scale, so the system cannot rely on instant autoscaling. - Traffic is bursty. - The system should maximize GPU utilization while still providing a predictable user experience. - The design should cover queueing, scheduling, admission control, prioritization, storage, failure handling, and observability. Explain how you would design the APIs, control plane, worker architecture, scheduling strategy, and overload behavior.

Quick Answer: This question evaluates competency in designing GPU-constrained, production-grade ML serving platforms, emphasizing resource management, job scheduling and prioritization, admission control, durable storage, failure handling, and observability within distributed systems.

Related Interview Questions

  • Design a Text-to-Video Generation Service - OpenAI (medium)
  • Design a Text-to-Video Generation System - OpenAI (hard)
  • Design a Real-Time Sensor Intelligence System - OpenAI (medium)
  • Mine Novel Images from Unlabeled Data - OpenAI (medium)
  • Design a RAG system with evaluation - OpenAI (medium)
OpenAI logo
OpenAI
Feb 23, 2026, 12:00 AM
Software Engineer
Technical Screen
ML System Design
7
0
Loading...

Design a text-to-video generation platform similar to a modern generative video product. Treat the actual model inference on GPUs as a black box: a job enters a GPU worker and eventually produces a video.

Focus on the serving platform rather than model internals. The main requirements are:

  • Users submit prompts and generation parameters, receive a job ID, and later fetch the result.
  • GPU capacity is fixed or slow to scale, so the system cannot rely on instant autoscaling.
  • Traffic is bursty.
  • The system should maximize GPU utilization while still providing a predictable user experience.
  • The design should cover queueing, scheduling, admission control, prioritization, storage, failure handling, and observability.

Explain how you would design the APIs, control plane, worker architecture, scheduling strategy, and overload behavior.

Solution

Show

Submit Your Answer

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More OpenAI•More Software Engineer•OpenAI Software Engineer•OpenAI ML System Design•Software Engineer ML System Design
PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.