Design task scheduler with dependencies

Q: Design task scheduler with dependencies

This is a System Design interview question from Google for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

Design a Distributed Task Scheduling Infrastructure

Context

Design a distributed task scheduling and orchestration system that can run at scale, supports task dependencies (DAGs), persists state, and handles both short and long-running tasks.

Requirements

Data models
- Define schemas/models for Tasks and DAGs, including runs/attempts and state transitions.
APIs
- Submit a task or DAG, cancel, and query status/history.
Execution engine
- How tasks are dispatched, leased, executed, and acknowledged.
Fault tolerance
- Retries with backoff, idempotency, exactly-once vs at-least-once semantics.
Backpressure
- Prevent overload and provide fairness.
Worker heartbeats
- Registration, lease extensions, and liveness detection.
Stuck/straggler handling
- Detection and mitigation (e.g., timeouts, speculative execution).
Scaling & multi-tenancy
- Horizontal scale, isolation, quotas, fairness, and preemption.
Observability
- Metrics, logs, tracing, audits, and a minimal UI model.

Assume a heterogeneous worker fleet (containers/VMs), a durable message bus, and a persistent metadata store.

Design task scheduler with dependencies

Design a Distributed Task Scheduling Infrastructure

Context

Requirements

Solution (Locked)

Comments (0)