PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/System Design/Google

Design a Task Scheduler for Opaque Long-Running GPU Jobs ("Design Sora")

Last updated: Jun 24, 2026

Quick Overview

This system design question evaluates a candidate's ability to architect a distributed job scheduler and orchestrator for expensive, long-running, GPU-bound tasks. It tests core competencies in asynchronous processing, multi-tenant fairness, fault tolerance, and scalable resource management — skills commonly probed at the senior software engineer level.

  • hard
  • Google
  • System Design
  • Software Engineer

Design a Task Scheduler for Opaque Long-Running GPU Jobs ("Design Sora")

Company: Google

Role: Software Engineer

Category: System Design

Difficulty: hard

Interview Round: Technical Screen

Design a task scheduler that runs long-running, opaque video-generation jobs at scale (the "Design Sora" prompt). Your platform lets users submit a text prompt and receive a generated video back. The actual generation is performed by a **black-box binary**: you hand it a prompt (plus parameters such as resolution, duration, and seed) and, some minutes later, it emits a video file. You do **not** control or modify that binary — treat it as an opaque, GPU-bound task that takes a highly variable amount of time (seconds to many minutes), consumes a whole GPU (or several) while it runs, and may crash, hang, or run out of memory. Your job is to design the **system around** that binary: accept user submissions, queue them, schedule them onto a fleet of GPU workers, track each job's lifecycle, handle failures and retries, enforce fairness/priority across users, and deliver the finished video back to the user. In other words, the interesting problem here is a **distributed job scheduler / orchestrator for expensive, long-running, unreliable tasks** — the video model itself is intentionally a black box. ```hint Reframe the prompt The phrase "Design Sora" is a red herring — you are **not** designing a video model. Restate the problem out loud as "design a distributed scheduler for long-running, GPU-bound, opaque tasks" and design to that. The binary is just a unit of work with a duration, a resource footprint, and a failure rate. ``` ```hint Decompose the lifecycle Walk a single job through every state: `submitted → queued → scheduled → running → (succeeded | failed | timed_out | cancelled)`. Each transition is where the hard questions live (admission/quotas, queueing/fairness, placement onto GPUs, heartbeating a running job, retry vs. dead-letter, result delivery). ``` ```hint Decouple the slow part Generation takes minutes, so the submit API must be **asynchronous**: accept the request, persist a job row, return a `job_id` immediately, and let the user poll or get notified. Never hold an HTTP request open for the duration of generation. ``` ### Constraints & Assumptions State your own numbers; reasonable defaults to anchor on: - ~1–5M submitted jobs/day (tens of jobs/sec average, with bursty peaks of several hundred/sec). - Each job occupies 1+ GPUs for a p50 of ~1–2 min and a p99 of ~10+ min; jobs are **not** preemptible mid-generation in the simple version (the binary is a black box). - A fleet of thousands of GPUs across multiple regions/zones; GPUs are the scarce, expensive resource — target high utilization. - Output videos are tens to hundreds of MB; store in blob storage and serve via signed URLs / CDN. - Multi-tenant: per-user/per-org quotas and priority tiers (e.g., free vs. paid) must be enforced; no single user may starve others. - Availability target for the control plane (submit/status) ~99.9%+; an individual job may fail and be retried, but a job must never be silently lost. ### Clarifying Questions to Ask - Is generation strictly asynchronous (submit now, retrieve later), or is there any interactive/streaming preview requirement? - What are the priority/fairness rules across tenants — strict tiers, weighted fair sharing, or per-user concurrency caps? - What is the expected GPU footprint per job, and can a job span multiple GPUs/nodes, or is it always single-GPU? - What should happen to a job that exceeds a wall-clock budget — kill and retry, kill and fail, or let it run? - What are the retention and access-control requirements for generated videos (who can fetch a result, for how long)? - Are there hard cost ceilings or per-user spend caps that the scheduler must enforce? ### What a Strong Answer Covers ```premium-lock What a Strong Answer Covers ``` ### Follow-up Questions - A class of prompts reliably makes the binary hang and burn a GPU for 30 minutes before timing out. How do you detect and contain this so it does not degrade everyone else's latency? - The black-box binary releases a new version with different GPU-memory and runtime characteristics. How do you roll it out safely across the fleet without a global outage or a thundering-herd of retries? - Paying customers complain that during traffic spikes their jobs sit behind a flood of free-tier jobs. Concretely, how does your fairness/priority mechanism fix this, and what are its failure modes? - How would you add a per-user, real-time spend cap that stops scheduling new jobs once a budget is hit, given that you only learn a job's true cost after it finishes?

Quick Answer: This system design question evaluates a candidate's ability to architect a distributed job scheduler and orchestrator for expensive, long-running, GPU-bound tasks. It tests core competencies in asynchronous processing, multi-tenant fairness, fault tolerance, and scalable resource management — skills commonly probed at the senior software engineer level.

Related Interview Questions

  • Design a Security Monitoring Framework - Google (medium)
  • Design an Online Coding Judge Platform - Google (medium)
  • Design Calendar Event Conflict Handling - Google (medium)
  • Design a pub-sub replay system - Google (hard)
  • How to host many domains on one IP? - Google (medium)
Google logo
Google
Jun 14, 2026, 12:00 AM
Software Engineer
Technical Screen
System Design
1
0

Design a task scheduler that runs long-running, opaque video-generation jobs at scale (the "Design Sora" prompt).

Your platform lets users submit a text prompt and receive a generated video back. The actual generation is performed by a black-box binary: you hand it a prompt (plus parameters such as resolution, duration, and seed) and, some minutes later, it emits a video file. You do not control or modify that binary — treat it as an opaque, GPU-bound task that takes a highly variable amount of time (seconds to many minutes), consumes a whole GPU (or several) while it runs, and may crash, hang, or run out of memory.

Your job is to design the system around that binary: accept user submissions, queue them, schedule them onto a fleet of GPU workers, track each job's lifecycle, handle failures and retries, enforce fairness/priority across users, and deliver the finished video back to the user. In other words, the interesting problem here is a distributed job scheduler / orchestrator for expensive, long-running, unreliable tasks — the video model itself is intentionally a black box.

Constraints & Assumptions

State your own numbers; reasonable defaults to anchor on:

  • ~1–5M submitted jobs/day (tens of jobs/sec average, with bursty peaks of several hundred/sec).
  • Each job occupies 1+ GPUs for a p50 of ~1–2 min and a p99 of ~10+ min; jobs are not preemptible mid-generation in the simple version (the binary is a black box).
  • A fleet of thousands of GPUs across multiple regions/zones; GPUs are the scarce, expensive resource — target high utilization.
  • Output videos are tens to hundreds of MB; store in blob storage and serve via signed URLs / CDN.
  • Multi-tenant: per-user/per-org quotas and priority tiers (e.g., free vs. paid) must be enforced; no single user may starve others.
  • Availability target for the control plane (submit/status) ~99.9%+; an individual job may fail and be retried, but a job must never be silently lost.

Clarifying Questions to Ask

  • Is generation strictly asynchronous (submit now, retrieve later), or is there any interactive/streaming preview requirement?
  • What are the priority/fairness rules across tenants — strict tiers, weighted fair sharing, or per-user concurrency caps?
  • What is the expected GPU footprint per job, and can a job span multiple GPUs/nodes, or is it always single-GPU?
  • What should happen to a job that exceeds a wall-clock budget — kill and retry, kill and fail, or let it run?
  • What are the retention and access-control requirements for generated videos (who can fetch a result, for how long)?
  • Are there hard cost ceilings or per-user spend caps that the scheduler must enforce?

What a Strong Answer Covers Premium

Follow-up Questions

  • A class of prompts reliably makes the binary hang and burn a GPU for 30 minutes before timing out. How do you detect and contain this so it does not degrade everyone else's latency?
  • The black-box binary releases a new version with different GPU-memory and runtime characteristics. How do you roll it out safely across the fleet without a global outage or a thundering-herd of retries?
  • Paying customers complain that during traffic spikes their jobs sit behind a flood of free-tier jobs. Concretely, how does your fairness/priority mechanism fix this, and what are its failure modes?
  • How would you add a per-user, real-time spend cap that stops scheduling new jobs once a budget is hit, given that you only learn a job's true cost after it finishes?

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Google•More Software Engineer•Google Software Engineer•Google System Design•Software Engineer System Design
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.