PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/System Design/Google

Design task scheduler with dependencies

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's ability to design distributed task scheduling and orchestration systems, covering competencies in data modeling for tasks and DAGs, API design, execution engines, fault tolerance, backpressure, worker liveness, scaling, multi-tenancy, and observability.

  • hard
  • Google
  • System Design
  • Software Engineer

Design task scheduler with dependencies

Company: Google

Role: Software Engineer

Category: System Design

Difficulty: hard

Interview Round: Technical Screen

Design a task scheduling infrastructure that supports task dependencies, persistent task storage, scheduling policies, and long-running tasks. Specify data models for tasks and DAGs, APIs to submit, cancel, and query tasks, the execution engine, fault tolerance for retries and idempotency, backpressure, worker heartbeats, and handling of stuck or straggler tasks. Discuss scaling, multi-tenant isolation, and observability.

Quick Answer: This question evaluates a candidate's ability to design distributed task scheduling and orchestration systems, covering competencies in data modeling for tasks and DAGs, API design, execution engines, fault tolerance, backpressure, worker liveness, scaling, multi-tenancy, and observability.

Related Interview Questions

  • Design an Online Coding Judge Platform - Google (medium)
  • Design a pub-sub replay system - Google (hard)
  • How to host many domains on one IP? - Google (medium)
  • Design street-view image ingestion and storage system - Google (hard)
  • Design a global real-time notification system - Google (medium)
Google logo
Google
Sep 6, 2025, 12:00 AM
Software Engineer
Technical Screen
System Design
10
0

Design a Distributed Task Scheduling Infrastructure

Context

Design a distributed task scheduling and orchestration system that can run at scale, supports task dependencies (DAGs), persists state, and handles both short and long-running tasks.

Requirements

  1. Data models
    • Define schemas/models for Tasks and DAGs, including runs/attempts and state transitions.
  2. APIs
    • Submit a task or DAG, cancel, and query status/history.
  3. Execution engine
    • How tasks are dispatched, leased, executed, and acknowledged.
  4. Fault tolerance
    • Retries with backoff, idempotency, exactly-once vs at-least-once semantics.
  5. Backpressure
    • Prevent overload and provide fairness.
  6. Worker heartbeats
    • Registration, lease extensions, and liveness detection.
  7. Stuck/straggler handling
    • Detection and mitigation (e.g., timeouts, speculative execution).
  8. Scaling & multi-tenancy
    • Horizontal scale, isolation, quotas, fairness, and preemption.
  9. Observability
    • Metrics, logs, tracing, audits, and a minimal UI model.

Assume a heterogeneous worker fleet (containers/VMs), a durable message bus, and a persistent metadata store.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Google•More Software Engineer•Google Software Engineer•Google System Design•Software Engineer System Design
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.