System Design Prompt
Design a Job Scheduler + ETL pipeline system.
The system should allow users (or internal services) to:
-
Define ETL jobs (extract → transform → load)
-
Schedule jobs (cron / fixed interval / one-off)
-
Run jobs on a fleet of workers
-
Track job/run status and logs
-
Support retries and failure handling
Requirements
-
Job definition
: store metadata (owner, schedule, source/sink, parameters, code/artifact location).
-
Scheduling
: trigger runs on time; avoid duplicate runs.
-
Execution
: dispatch runs to workers; support horizontal scaling.
-
Reliability
:
-
at-least-once execution with dedup where needed
-
retries with backoff
-
handle worker crashes mid-run
-
Observability
: job/run status, logs, metrics, alerting.
-
Multi-tenancy
(basic): isolate teams with quotas/limits.
Discussion points
Explain key components (API, scheduler, queue, workers, metadata DB), data model, scaling strategy, and how you’d use/load-balance caches/queues. Call out major tradeoffs (exactly-once vs at-least-once, latency vs throughput).