Service Startup Scheduler on a Host with M CPU Cores
Context
You are given a directed acyclic graph (DAG) of services where an edge u → v means service v depends on service u successfully starting and becoming healthy. All services must start on a single host that has M CPU cores. Each service may also consume additional resources (e.g., memory, ports) during startup. The goal is to minimize total startup time (makespan) while respecting dependencies and resource limits.
Task
Design a scheduler that:
-
Detects which services are ready to start (dependency-respecting).
-
Maximizes parallelism subject to the M-core limit and other resource constraints (CPU, memory, ports).
-
Prioritizes work to minimize the overall makespan.
-
Bounds concurrency globally and per resource class.
-
Handles timeouts, retries, backoff, and failures.
-
Provides strong observability (metrics, logs, traces) and correctness properties.
-
Uses clear data structures (e.g., in-degree tracking, ready queues) and describes algorithmic complexity.
-
Explains how to extend the design across multiple machines.
Assume each service reports healthy only after its health check passes. If the question does not specify per-service resource demands or durations, assume unit CPU per start and unknown duration with historical estimates.