Services must start on a host with M CPU cores, and each service may depend on others (a DAG). Design a scheduler that minimizes total startup time while respecting dependencies. Discuss: detecting ready services, maximizing parallelism within core limits, prioritization, bounding concurrency, handling timeouts/retries/failures, backoff, resource constraints (CPU, memory, ports), and observability. Describe data structures (e.g., in-degree tracking, ready queues), correctness properties, and how you would extend it across multiple machines.

This question evaluates scheduling and resource-management skills, including DAG dependency handling, concurrency bounding, timeout/retry strategies, observability, and correctness reasoning for service startup orchestration.

How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

What difficulty level is this interview question?

This is a hard difficulty System Design question, commonly asked during Onsite rounds at Snowflake.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Snowflake during technical interviews.

Design multi-core service startup scheduler | Snowflake Interview Question

Service Startup Scheduler on a Host with M CPU Cores

Context

You are given a directed acyclic graph (DAG) of services where an edge u → v means service v depends on service u successfully starting and becoming healthy. All services must start on a single host that has M CPU cores. Each service may also consume additional resources (e.g., memory, ports) during startup. The goal is to minimize total startup time (makespan) while respecting dependencies and resource limits.

Task

Design a scheduler that:

Detects which services are ready to start (dependency-respecting).
Maximizes parallelism subject to the M-core limit and other resource constraints (CPU, memory, ports).
Prioritizes work to minimize the overall makespan.
Bounds concurrency globally and per resource class.
Handles timeouts, retries, backoff, and failures.
Provides strong observability (metrics, logs, traces) and correctness properties.
Uses clear data structures (e.g., in-degree tracking, ready queues) and describes algorithmic complexity.
Explains how to extend the design across multiple machines.

Assume each service reports healthy only after its health check passes. If the question does not specify per-service resource demands or durations, assume unit CPU per start and unknown duration with historical estimates.

Context

Task

Design a scheduler that:

Detects which services are ready to start (dependency-respecting).

Maximizes parallelism subject to the M-core limit and other resource constraints (CPU, memory, ports).

Prioritizes work to minimize the overall makespan.

Bounds concurrency globally and per resource class.

Handles timeouts, retries, backoff, and failures.

Provides strong observability (metrics, logs, traces) and correctness properties.

Uses clear data structures (e.g., in-degree tracking, ready queues) and describes algorithmic complexity.

Explains how to extend the design across multiple machines.

Design multi-core service startup scheduler

Quick Overview

Service Startup Scheduler on a Host with M CPU Cores

Context

Task

Solution

Comments (0)

Design multi-core service startup scheduler

Quick Overview

Service Startup Scheduler on a Host with M CPU Cores

Context

Task

Solution

Comments (0)