Design a distributed cron job scheduler
Company: DoorDash
Role: Software Engineer
Category: System Design
Difficulty: hard
Interview Round: Onsite
Design a distributed cron job scheduler. Requirements: support cron expressions and time zones (including DST), high availability with leader election and failover, horizontal scalability to millions of scheduled jobs, exactly-once or at-least-once execution semantics, idempotency and deduplication, misfire handling and catch-up policies, retry with backoff, pause/resume, job dependencies, worker registration and heartbeats, multi-tenant isolation and quotas, authentication/authorization, auditing, and observability (metrics, logs, traces, alerts). Describe components (metadata store, scheduler, dispatcher, workers, timing wheels or hierarchical timers, message bus), data models, scheduling algorithms, and consistency choices. Discuss failure modes (clock skew, partition, worker crash), capacity planning, and testing strategies.
Quick Answer: This question evaluates competence in distributed systems architecture, scalable scheduler design, consistency and delivery semantics, fault tolerance, multitenancy, and operational observability.