Design a scalable job scheduler

Q: Design a scalable job scheduler

This question evaluates skills in designing large-scale distributed job scheduling systems, covering data modeling for one-off and recurring jobs, reliable execution semantics and idempotency, scaling and sharding strategies, time zone/DST handling, and operational concerns like monitoring and dead-lettering.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

Design a Job Scheduling Service

You are designing a multi-tenant job scheduling service that runs one-off and recurring background jobs at scale. The service should expose APIs to manage jobs and reliably execute them via a distributed scheduler/worker architecture.

Assume:

Jobs can be internal tasks or HTTP callbacks.
At-least-once execution semantics are acceptable; idempotency is required for correctness.
The system must support millions of scheduled jobs and high throughput dispatch.

Requirements

Data model/schema for jobs (one-off and recurring), including retry policy, priority, time zone, and execution metadata.
APIs to create, update, pause, resume, and cancel jobs.
Execution architecture: scheduler, dispatcher, workers; leases and state transitions.
Reliability: idempotency, retries/backoff, deduplication, timeouts, dead-lettering.
Time handling: time zones, DST, and clock skew.
Scaling/sharding and fairness across tenants/priorities.
Monitoring and operability.
Optimize for efficiently retrieving all jobs scheduled to run in the next five minutes: indexing/partitioning strategies, example queries, and handling high throughput.

Design a scalable job scheduler

Quick Overview

Design a Job Scheduling Service

Requirements

Solution

Comments (0)