Implement Dependency-Aware Task Scheduler
Company: Scale AI
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: hard
Interview Round: Technical Screen
Quick Answer: This question evaluates skills in designing and implementing data structures and algorithms for priority scheduling, dependency resolution, and dynamic updates, encompassing priority queues, graph-based dependency tracking, and state management.
Part 1: Schedule Tasks by Deadline
Constraints
- `0 <= len(operations) <= 2 * 10^5`
- `len(add_batches) == len(operations)`
- The total number of tasks across all add operations is at most `2 * 10^5`
- `1 <= taskId <= 10^9`
- `0 <= deadline <= 10^9`
- All `taskId` values are unique across all added tasks
Examples
Input: (['add', 'consume', 'consume'], [[[10, 5], [3, 2], [7, 2]], [], []])
Expected Output: [3, 7]
Explanation: Tasks 3 and 7 share the smallest deadline 2, so task 3 is consumed first because it has the smaller taskId.
Input: (['add', 'consume', 'add', 'consume', 'consume'], [[[1, 4], [2, 6]], [], [[3, 1]], [], []])
Expected Output: [1, 3, 2]
Explanation: After consuming task 1, a new earlier-deadline task 3 is added and should be consumed next.
Input: (['consume', 'add', 'consume', 'consume'], [[], [[5, 10]], [], []])
Expected Output: [-1, 5, -1]
Explanation: The first and last consume operations happen when the scheduler is empty.
Input: (['add', 'consume'], [[], []])
Expected Output: [-1]
Explanation: An empty add batch changes nothing, so the consume operation returns -1.
Hints
- Use a min-heap keyed by `(deadline, taskId)` so the next task is always on top.
- You only need to return values for `consume` operations, not for `add` operations.
Part 2: Dependency-Aware Task Consumption
Constraints
- `0 <= len(operations) <= 2 * 10^5`
- `len(add_batches) == len(operations)`
- The total number of tasks across all add operations is at most `2 * 10^5`
- The total number of dependency references across all tasks is at most `2 * 10^5`
- `1 <= taskId <= 10^9`
- `0 <= deadline <= 10^9`
- All `taskId` values are unique across all added tasks
- Within a single task, dependency IDs are distinct
Examples
Input: (['add', 'consume', 'consume', 'consume'], [[[1, 5], [2, 2, 1], [3, 2]], [], [], []])
Expected Output: [3, 1, 2]
Explanation: Tasks 1 and 3 are initially eligible; task 3 is consumed first due to earlier deadline. Consuming task 1 then unlocks task 2.
Input: (['add', 'consume', 'add', 'consume', 'consume'], [[[10, 3, 20]], [], [[20, 1]], [], []])
Expected Output: [-1, 20, 10]
Explanation: Task 10 is blocked by task 20, which is added later. The first consume finds nothing eligible.
Input: (['add', 'consume', 'add', 'consume'], [[[1, 5]], [], [[2, 1, 1]], []])
Expected Output: [1, 2]
Explanation: When task 2 is added, its dependency task 1 has already been consumed, so task 2 is immediately eligible.
Input: (['add', 'consume', 'consume', 'consume'], [[[5, 4], [4, 4], [6, 3, 4]], [], [], []])
Expected Output: [4, 6, 5]
Explanation: Tasks 4 and 5 tie on deadline, so 4 is consumed first. That unlocks task 6 with deadline 3, which then comes before task 5.
Hints
- Track how many dependencies of each task are still unsatisfied.
- Also build a reverse mapping from a dependency ID to the tasks waiting on it, so one consume can unlock other tasks.
Part 3: Dependency-Aware Scheduler with Deadline Updates
Constraints
- `0 <= len(operations) <= 2 * 10^5`
- `len(add_batches) == len(operations)`
- `len(updates) == len(operations)`
- The total number of tasks across all add operations is at most `2 * 10^5`
- The total number of dependency references across all tasks is at most `2 * 10^5`
- The total number of update operations is at most `2 * 10^5`
- `1 <= taskId <= 10^9`
- `0 <= deadline, newDeadline <= 10^9`
- All `taskId` values are unique across all added tasks
- Within a single task, dependency IDs are distinct
Examples
Input: (['add', 'update', 'consume', 'consume'], [[[1, 5], [2, 3]], [], [], []], [[], [1, 1], [], []])
Expected Output: [1, 2]
Explanation: Task 1 is updated to an earlier deadline and should be consumed before task 2.
Input: (['add', 'update', 'consume', 'consume', 'consume'], [[[1, 5, 2], [2, 4]], [], [], [], []], [[], [1, 1], [], [], []])
Expected Output: [2, 1, -1]
Explanation: Task 1 is blocked when updated, but its new deadline 1 is used once task 2 is consumed and task 1 becomes eligible.
Input: (['add', 'consume', 'update', 'update', 'consume'], [[[1, 2]], [], [], [], []], [[], [], [1, 10], [9, 1], []])
Expected Output: [1, -1]
Explanation: Updating a consumed task or a task that does not exist has no effect.
Input: (['add', 'update', 'update', 'consume', 'consume'], [[[1, 5], [2, 5]], [], [], [], []], [[], [2, 1], [2, 7], [], []])
Expected Output: [1, 2]
Explanation: Task 2 gets two updates. The stale heap entry for deadline 1 must be ignored because the latest deadline is 7.
Input: (['add', 'update', 'add', 'consume', 'consume'], [[[10, 5, 20]], [], [[20, 2]], [], []], [[], [10, 1], [], [], []])
Expected Output: [20, 10]
Explanation: Task 10 is updated while blocked on task 20. After task 20 is consumed, task 10 becomes eligible with its updated deadline.
Hints
- A min-heap still helps, but an updated task may already have an old entry in the heap. Consider lazy deletion.
- When a blocked task becomes eligible, push its current deadline, not its original one.