Design a distributed job scheduler
Company: Amazon
Role: Software Engineer
Category: System Design
Difficulty: hard
Interview Round: Onsite
Design a distributed job scheduler that supports cron and ad‑hoc jobs, enqueues tasks, dispatches to workers, and provides at‑least‑once execution with idempotency. Describe architecture (coordinator, durable queue, workers), leader election, time source and cron parsing, job storage, retries with exponential backoff, priorities, deduplication, task dependencies, observability/alerting, and failure recovery. Define APIs, data model, and execution/status flows. Consider horizontal scalability, multi‑tenant isolation, and multi‑region operation.
Quick Answer: This question evaluates a candidate's ability to design scalable, reliable distributed systems for job scheduling, covering concepts such as leader election, durable queues, idempotency and deduplication, retries, task dependencies, multi-tenant isolation, and multi-region operation.