PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/System Design/DoorDash

Design a distributed cron job scheduler

Last updated: Mar 29, 2026

Quick Overview

This question evaluates competence in distributed systems architecture, scalable scheduler design, consistency and delivery semantics, fault tolerance, multitenancy, and operational observability.

  • hard
  • DoorDash
  • System Design
  • Software Engineer

Design a distributed cron job scheduler

Company: DoorDash

Role: Software Engineer

Category: System Design

Difficulty: hard

Interview Round: Onsite

Design a distributed cron job scheduler. Requirements: support cron expressions and time zones (including DST), high availability with leader election and failover, horizontal scalability to millions of scheduled jobs, exactly-once or at-least-once execution semantics, idempotency and deduplication, misfire handling and catch-up policies, retry with backoff, pause/resume, job dependencies, worker registration and heartbeats, multi-tenant isolation and quotas, authentication/authorization, auditing, and observability (metrics, logs, traces, alerts). Describe components (metadata store, scheduler, dispatcher, workers, timing wheels or hierarchical timers, message bus), data models, scheduling algorithms, and consistency choices. Discuss failure modes (clock skew, partition, worker crash), capacity planning, and testing strategies.

Quick Answer: This question evaluates competence in distributed systems architecture, scalable scheduler design, consistency and delivery semantics, fault tolerance, multitenancy, and operational observability.

Related Interview Questions

  • Design a resilient bootstrap API - DoorDash (medium)
  • Design Real-Time Driver Pay Aggregation - DoorDash (hard)
  • Design personalized restaurant search and recommendations - DoorDash (medium)
  • Design Food Ratings and Driver Payouts - DoorDash (medium)
  • Design a Customer Review Page - DoorDash (medium)
DoorDash logo
DoorDash
Jul 28, 2025, 12:00 AM
Software Engineer
Onsite
System Design
10
0

System Design: Distributed Cron Job Scheduler

Context

You are designing a multi-tenant, internet-scale cron job scheduler that triggers tasks according to cron expressions across different time zones. The system must support millions of scheduled jobs, be highly available, and ensure reliable dispatch and execution across a fleet of workers.

Functional Requirements

  • Cron expressions
    • Support standard cron syntax and IANA time zones, including daylight saving time (DST) correctness.
  • Execution semantics
    • Exactly-once or at-least-once delivery semantics (explain trade-offs).
    • Idempotency and deduplication across retries and failovers.
    • Misfire handling and catch-up policies (e.g., skip, fire-latest, fire-all, bounded catch-up).
    • Retries with backoff and jitter.
    • Pause/resume jobs; optional start/end time windows.
    • Job dependencies (e.g., run B after A completes; DAG support is a plus).
  • Worker lifecycle
    • Worker registration, capability discovery, and heartbeats.
    • Safe re-assignment on worker failure.
  • Multi-tenancy
    • Tenant isolation, quotas, priorities, and fair sharing.
  • Security and governance
    • Authentication and authorization (RBAC or ABAC).
    • Auditing of API changes and executions.
  • Observability
    • Metrics, logs, traces, and alerting for SLOs and failure detection.

Non-Functional Requirements

  • High availability with leader election and failover (no single point of failure).
  • Horizontal scalability to millions of jobs and tens of thousands of triggers per second.
  • Low and predictable scheduling latency (e.g., p99 < a few seconds) and bounded scheduler lag.
  • Strong durability for job metadata and execution records.

What to Deliver

Describe the following:

  1. Architecture and components (metadata store, scheduler shards, dispatcher, workers, timing wheels or hierarchical timers, message bus, coordination service).
  2. Data models (jobs, schedules, executions, attempts, dependencies, workers, tenants, quotas, audit logs).
  3. Scheduling and dispatch algorithms (cron parsing, time zone/DST handling, timer data structures, sharding strategy).
  4. Consistency choices and delivery semantics (exactly-once vs at-least-once, idempotency, dedup).
  5. Failure modes and mitigations (clock skew, network partition, leader/scheduler/worker crashes).
  6. Capacity planning (orders of magnitude, partitioning, indexes, throughput estimates).
  7. Testing strategies (unit, property, integration, chaos, time-travel, scale testing).

Solution

Show

Submit Your Answer

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More DoorDash•More Software Engineer•DoorDash Software Engineer•DoorDash System Design•Software Engineer System Design
PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.