PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/ML System Design/OpenAI

Design a Text-to-Video Generation Platform (Sora-style)

Last updated: Jul 1, 2026

Quick Overview

This ML system design question evaluates a candidate's ability to architect the infrastructure around a large generative video model, including asynchronous job orchestration, GPU fleet scheduling, and content safety pipelines. It tests practical application of distributed systems concepts such as durable queuing, idempotency, and utilization-aware autoscaling for expensive compute resources, a common theme in senior ML infrastructure interviews.

  • hard
  • OpenAI
  • ML System Design
  • Software Engineer

Design a Text-to-Video Generation Platform (Sora-style)

Company: OpenAI

Role: Software Engineer

Category: ML System Design

Difficulty: hard

Interview Round: Onsite

## Design a Text-to-Video Generation Platform (Sora-style) You are asked to design the backend platform that powers a **text-to-video generation** product. A user submits a natural-language prompt (optionally with a reference image, a target duration, an aspect ratio, and a style), and the system returns a short generated video produced by a large generative video model. Treat the model itself as a given black box: it is a GPU-hungry diffusion/transformer model whose inference takes anywhere from tens of seconds to several minutes per clip. Your job is to design **everything around the model** — request submission, queuing, GPU-backed generation, input/output safety, storage, and delivery — so that it works correctly and economically at scale. ### Constraints & Assumptions - ~10M registered users, ~1M daily active users. - An active user generates ~5 clips/day → ~5M generation jobs/day, with peaks 3–5x the average. - Each generation consumes 30s–5min of GPU time depending on requested duration and resolution. - Output clips are up to ~20s, 720p–1080p, roughly 10–100 MB each. - The GPU fleet is the scarce, expensive resource; the design must keep utilization high. - Generation is asynchronous: users tolerate seconds-to-minutes latency but expect progress feedback. - Safety is non-negotiable: disallowed prompts (e.g., sexual content involving minors, non-consensual real-person likenesses, graphic violence) must be blocked at **both** input and output. ### Clarifying Questions to Ask - Is there a synchronous low-res preview, or is everything fully asynchronous? (Assume async with progress.) - Are there free vs. paid tiers with different quotas and queue priorities? - Do we need iterative editing (extend a clip, remix, regenerate a region), or only one-shot generation? - What are the retention policies for generated videos and for the prompts themselves? - Are there regional / data-residency or age-gating compliance requirements? - Do we own model serving, or do we call an internal inference service that abstracts the GPUs? ### Part 1 — Public API and Job Lifecycle Define the client-facing API and the full lifecycle of a generation job, from submission through to the delivered video. ```hint Where to start Model generation as a long-running async job: the submit call returns a `job_id` immediately, and the client polls or subscribes for status. Don't block an HTTP request for minutes. ``` ```hint Durability The GPU is the scarce resource, so decouple *submission* from *execution* with a durable queue. The API tier should be cheap and stateless; the expensive work happens behind the queue. ``` #### What This Part Should Cover ```premium-lock What This Part Should Cover ``` ### Part 2 — Generation Pipeline and GPU Scheduling Design how a queued job becomes a finished video: the worker pipeline, how work is dispatched to GPUs, and how you keep the expensive fleet busy. ```hint Separation Split the control plane (orchestration, queue, metadata) from the data plane (GPU workers that pull jobs and run the model). The control plane is cheap and elastic; the data plane is expensive and capacity-bounded. ``` ```hint Utilization Keep GPUs busy: priority queues per tier, autoscale workers on queue depth / wait time, batch compatible jobs where the model allows, and consider preemption so paid jobs aren't stuck behind a long free-tier backlog. ``` #### What This Part Should Cover ```premium-lock What This Part Should Cover ``` ### Part 3 — Safety, Storage, Delivery, and Observability Design input/output safety, where the generated videos live and how they reach users, and what you monitor. ```hint Two-sided safety Safety is two stages, not one: classify/clean the **prompt** before generation, and moderate the **output** (sampled frames + audio) after generation, before the video is ever made viewable. ``` ```hint Heavy bytes Keep the large video bytes out of your database — store them in object storage fronted by a CDN, and keep only metadata + a storage key in the DB. ``` #### What This Part Should Cover ```premium-lock What This Part Should Cover ``` ### What a Strong Answer Covers ```premium-lock What a Strong Answer Covers ``` ### Follow-up Questions - How would you support priority tiers (free vs. paid) without indefinitely starving free users? - A generation takes 4 minutes and the worker crashes at minute 3 — exactly what happens, and who notices? - How would you add an "extend this clip" / iterative-editing feature on top of this design? - How would you roll out and A/B test a new, more expensive model version safely without blowing the GPU budget?

Quick Answer: This ML system design question evaluates a candidate's ability to architect the infrastructure around a large generative video model, including asynchronous job orchestration, GPU fleet scheduling, and content safety pipelines. It tests practical application of distributed systems concepts such as durable queuing, idempotency, and utilization-aware autoscaling for expensive compute resources, a common theme in senior ML infrastructure interviews.

Related Interview Questions

  • Design a Text-to-Video Generation Service - OpenAI (medium)
  • Design a Text-to-Video Generation System - OpenAI (hard)
  • Design a Real-Time Sensor Intelligence System - OpenAI (medium)
  • Mine Novel Images from Unlabeled Data - OpenAI (medium)
  • Design a GPU-Efficient Video Service - OpenAI (medium)
|Home/ML System Design/OpenAI

Design a Text-to-Video Generation Platform (Sora-style)

OpenAI logo
OpenAI
Jun 14, 2026, 12:00 AM
hardSoftware EngineerOnsiteML System Design
0
0

Design a Text-to-Video Generation Platform (Sora-style)

You are asked to design the backend platform that powers a text-to-video generation product. A user submits a natural-language prompt (optionally with a reference image, a target duration, an aspect ratio, and a style), and the system returns a short generated video produced by a large generative video model.

Treat the model itself as a given black box: it is a GPU-hungry diffusion/transformer model whose inference takes anywhere from tens of seconds to several minutes per clip. Your job is to design everything around the model — request submission, queuing, GPU-backed generation, input/output safety, storage, and delivery — so that it works correctly and economically at scale.

Constraints & Assumptions

  • ~10M registered users, ~1M daily active users.
  • An active user generates ~5 clips/day → ~5M generation jobs/day, with peaks 3–5x the average.
  • Each generation consumes 30s–5min of GPU time depending on requested duration and resolution.
  • Output clips are up to ~20s, 720p–1080p, roughly 10–100 MB each.
  • The GPU fleet is the scarce, expensive resource; the design must keep utilization high.
  • Generation is asynchronous: users tolerate seconds-to-minutes latency but expect progress feedback.
  • Safety is non-negotiable: disallowed prompts (e.g., sexual content involving minors, non-consensual real-person likenesses, graphic violence) must be blocked at both input and output.

Clarifying Questions to Ask

  • Is there a synchronous low-res preview, or is everything fully asynchronous? (Assume async with progress.)
  • Are there free vs. paid tiers with different quotas and queue priorities?
  • Do we need iterative editing (extend a clip, remix, regenerate a region), or only one-shot generation?
  • What are the retention policies for generated videos and for the prompts themselves?
  • Are there regional / data-residency or age-gating compliance requirements?
  • Do we own model serving, or do we call an internal inference service that abstracts the GPUs?

Part 1 — Public API and Job Lifecycle

Define the client-facing API and the full lifecycle of a generation job, from submission through to the delivered video.

What This Part Should Cover Premium

Part 2 — Generation Pipeline and GPU Scheduling

Design how a queued job becomes a finished video: the worker pipeline, how work is dispatched to GPUs, and how you keep the expensive fleet busy.

What This Part Should Cover Premium

Part 3 — Safety, Storage, Delivery, and Observability

Design input/output safety, where the generated videos live and how they reach users, and what you monitor.

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

  • How would you support priority tiers (free vs. paid) without indefinitely starving free users?
  • A generation takes 4 minutes and the worker crashes at minute 3 — exactly what happens, and who notices?
  • How would you add an "extend this clip" / iterative-editing feature on top of this design?
  • How would you roll out and A/B test a new, more expensive model version safely without blowing the GPU budget?

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More OpenAI•More Software Engineer•OpenAI Software Engineer•OpenAI ML System Design•Software Engineer ML System Design

Your design canvas — auto-saved

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.