This question evaluates a candidate's system design skills in building a scalable, fault-tolerant job scheduling system and efficient near-real-time time-window querying, focusing on distributed scheduling, storage and indexing strategies, partitioning, and reliability guarantees, and is commonly asked to probe trade-offs among scalability, latency, availability, and consistency. Belonging to the System Design domain, it examines both high-level architectural thinking and practical application-level considerations such as data modeling, query/index design, operational concerns like hot partitions, and horizontal scaling.
Design a scalable, fault-tolerant job scheduling system.
The system should allow clients to schedule background jobs (for example, sending emails or running batch computations) to be executed at specific future times, and possibly on a recurring basis.
Then, as a follow-up, design how to efficiently query jobs scheduled in the next N hours in order to power a near real-time dashboard that shows upcoming jobs.
Describe:
Explain your trade-offs, including:
Login required