Design scalable job scheduler and query dashboard
Company: LinkedIn
Role: Software Engineer
Category: System Design
Difficulty: medium
Interview Round: Onsite
Design a scalable, fault-tolerant **job scheduling system**.
The system should allow clients to schedule background jobs (for example, sending emails or running batch computations) to be executed at specific future times, and possibly on a recurring basis.
Then, as a follow-up, design how to **efficiently query jobs scheduled in the next N hours** in order to power a near real-time dashboard that shows upcoming jobs.
Describe:
1. **Functional requirements** (e.g., create/update/cancel jobs, one-time vs recurring jobs, job execution guarantees, etc.).
2. **Non-functional requirements** (e.g., scale, latency, reliability, availability, consistency expectations).
3. A **high-level architecture** for the job scheduler:
- Main components (API layer, scheduler, workers, storage, queues, etc.).
- How jobs are stored, assigned to workers, and executed at (approximately) the right time.
- How you ensure reliability (no job lost, minimal duplicates) and fault tolerance.
4. A **data model** for jobs (what fields you store, how you index them).
5. An API and storage/query design to **efficiently fetch all jobs scheduled between now and now + N hours** for a dashboard. Assume:
- There can be a very large number of jobs.
- The dashboard needs low-latency, high-QPS reads.
- N might vary per request (e.g., 1 hour, 6 hours, 24 hours).
Explain your trade-offs, including:
- How you partition or index data to support both scheduling and time-window queries.
- How you would avoid full table scans when querying the next N hours.
- How you would handle hot partitions (e.g., many jobs around the same time) and horizontal scaling.
Quick Answer: This question evaluates a candidate's system design skills in building a scalable, fault-tolerant job scheduling system and efficient near-real-time time-window querying, focusing on distributed scheduling, storage and indexing strategies, partitioning, and reliability guarantees, and is commonly asked to probe trade-offs among scalability, latency, availability, and consistency. Belonging to the System Design domain, it examines both high-level architectural thinking and practical application-level considerations such as data modeling, query/index design, operational concerns like hot partitions, and horizontal scaling.