This question evaluates a candidate's ability to design distributed task scheduling and orchestration systems, covering competencies in data modeling for tasks and DAGs, API design, execution engines, fault tolerance, backpressure, worker liveness, scaling, multi-tenancy, and observability.
Design a distributed task scheduling and orchestration system that can run at scale, supports task dependencies (DAGs), persists state, and handles both short and long-running tasks.
Assume a heterogeneous worker fleet (containers/VMs), a durable message bus, and a persistent metadata store.
Login required