Design a job scheduler

Q: Design a job scheduler

This is a System Design interview question from Applied Intuition for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

Design a Scalable Distributed Job Scheduler

Context

Design a multi-tenant, horizontally scalable job scheduler for backend services. The system must persist jobs durably, survive failures, and support high throughput. It will run in the cloud and integrate with a durable queue and a replicated metadata store.

Requirements

Functional
1. Job types: immediate, delayed (schedule at/after time), recurring (cron-like), and jobs with dependencies (DAG).
2. Priorities (e.g., high, medium, low) with fair-share across tenants.
3. Retries with configurable backoff and jitter; dead-lettering.
4. Core APIs: submit, cancel, pause, resume, status (and list/filter optional).
5. Execution semantics: at-least-once by default; discuss how to approach exactly-once.
6. Idempotency, failure handling, deduplication.
Non-Functional
1. Persistence and durability of both metadata and queues.
2. Monitoring, alerting, and observability.
3. Multi-tenant isolation, quotas, and security.
4. Horizontal scaling to high throughput and low scheduler latency.
5. Multi-region deployment and failover.

Deliverables

System components: API layer, scheduler/dispatcher, queues, workers/executors, metadata store.
Data model for jobs, attempts, dependencies, and queues.
API design for submit, cancel, pause, resume, status.
Discussion of execution semantics, idempotency, failure handling/dedup, monitoring/alerting, multi-tenant isolation, horizontal scaling, durability, and multi-region strategies.

Design a job scheduler

Design a Scalable Distributed Job Scheduler

Context

Requirements

Deliverables

Solution

Comments (0)