Design a task scheduling service that continuously ingests tasks and dispatches them to workers.
Each task has:
-
id
: unique task identifier
-
deadline
: execution deadline
-
prerequisites
: a list of task IDs that must complete first
The system should:
-
Accept tasks in batches or as a stream.
-
Validate task definitions, including malformed input, duplicate IDs, and invalid prerequisite references.
-
Verify that task dependencies form a directed acyclic graph.
-
Dispatch only runnable tasks, where all prerequisites are completed.
-
Among runnable tasks, prefer the one with the earliest deadline.
-
Support multiple workers consuming tasks concurrently without assigning the same task twice.
-
Handle worker crashes, retries, and recovery.
-
Scale to large task volumes and provide useful monitoring.
Discuss the API design, storage model, scheduling logic, DAG validation strategy, concurrency control, fault tolerance, and scaling approach.