Design a Job Scheduler Service (One-off and Recurring)
Assume a multi-tenant service that schedules and runs user-defined jobs (HTTP/webhook, internal tasks, etc.). Jobs may be one-off or recurring (e.g., cron). The system must operate at scale with strong observability and fault tolerance.
Specify the following:
(a) Data Model / Database Schema
Design schemas for jobs, schedules, executions, workers, and supporting entities with fields such as: id, tenant, payload, schedule type/cron, next_run_at, status, retry policy, dedupe key, shard key, and updated_at.
(b) Public APIs
Define APIs for creating, updating, canceling, and querying jobs; listing upcoming jobs; and triggering immediate runs. Include idempotency and filters.
(c) Core Components
Describe ingress, scheduler, queue/broker, workers, persistence, and timers. Explain idempotency, retries/backoff, dead-lettering, observability, and failure handling.
(d) Efficiently Finding Jobs Due in the Next 5 Minutes
Explain indexing/partitioning, range scans on time-ordered keys, min-heap/time-wheel approaches, caching, and contention control.
(e) Scaling and Consistency
Discuss scaling strategy, at-least-once vs exactly-once execution semantics, multi-region architecture, and clock-skew mitigation. Provide trade-offs and simple diagrams.