Design a scalable job scheduler
Company: Amazon
Role: Software Engineer
Category: System Design
Difficulty: hard
Interview Round: Onsite
Design a job scheduling service. Specify the data model/schema for jobs (including one-off and recurring jobs), the APIs to create, update, pause, resume, and cancel jobs, and the execution architecture (scheduler, dispatcher, workers). Address reliability (idempotency, retries, deduplication), time zones and clock skew, scaling and sharding, and monitoring. Optimize for efficiently retrieving all jobs scheduled to run in the next five minutes; propose indexing/partitioning strategies and example queries, and discuss handling high throughput and fairness.
Quick Answer: This question evaluates skills in designing large-scale distributed job scheduling systems, covering data modeling for one-off and recurring jobs, reliable execution semantics and idempotency, scaling and sharding strategies, time zone/DST handling, and operational concerns like monitoring and dead-lettering.