PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/System Design/Databricks

Design a dependency-aware job scheduler

Last updated: Apr 19, 2026

Quick Overview

This question evaluates understanding of designing scalable, dependency-aware job schedulers and related competencies in distributed systems, including task dependency DAGs, state management, fault tolerance, retries, recovery, and observability.

  • medium
  • Databricks
  • System Design
  • Software Engineer

Design a dependency-aware job scheduler

Company: Databricks

Role: Software Engineer

Category: System Design

Difficulty: medium

Interview Round: Onsite

Design a backend job scheduler where each submitted job consists of multiple tasks with dependency relationships. A task may run only after all of its prerequisite tasks have completed successfully. Your design should cover: - How clients submit a job and its task dependency graph - Validation of the job definition, including invalid dependencies and cycles - How the scheduler finds runnable tasks - How tasks are dispatched to workers - Task and job state management - Retries, timeouts, and failure handling - How the system recovers from crashes or restarts - Observability, debugging, and operational concerns Assume this is a production service that may run many jobs concurrently.

Quick Answer: This question evaluates understanding of designing scalable, dependency-aware job schedulers and related competencies in distributed systems, including task dependency DAGs, state management, fault tolerance, retries, recovery, and observability.

Related Interview Questions

  • Design a Slack-Like Messaging System - Databricks (medium)
  • Design a Book Price Aggregator - Databricks (medium)
  • Design a Distributed File System - Databricks (medium)
  • Design a stock order manager - Databricks (medium)
  • Design an Online Bookstore - Databricks (hard)
Databricks logo
Databricks
Jan 5, 2026, 12:00 AM
Software Engineer
Onsite
System Design
9
0
Loading...

Design a backend job scheduler where each submitted job consists of multiple tasks with dependency relationships. A task may run only after all of its prerequisite tasks have completed successfully.

Your design should cover:

  • How clients submit a job and its task dependency graph
  • Validation of the job definition, including invalid dependencies and cycles
  • How the scheduler finds runnable tasks
  • How tasks are dispatched to workers
  • Task and job state management
  • Retries, timeouts, and failure handling
  • How the system recovers from crashes or restarts
  • Observability, debugging, and operational concerns

Assume this is a production service that may run many jobs concurrently.

Solution

Show

Submit Your Answer

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Databricks•More Software Engineer•Databricks Software Engineer•Databricks System Design•Software Engineer System Design
PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.