Implement Optimal Bucket Batching
Company: Amazon
Role: Machine Learning Engineer
Category: Coding & Algorithms
Difficulty: hard
Interview Round: Technical Screen
Quick Answer: This question evaluates algorithm design and optimization skills, focusing on combinatorial partitioning and batching strategies, efficient utilization of parallel GPUs, and time/space complexity analysis relevant to ML data pipelines.
Constraints
- 0 <= K == len(lengths) <= 5000
- 1 <= G <= 500 when K > 0
- 0 <= lengths[i] <= 10^9
Examples
Input: ([], 4)
Expected Output: (0, [])
Explanation: There are no documents, so there is no waste and no batches.
Input: ([7], 2)
Expected Output: (0, [[7]])
Explanation: A single document needs no padding.
Input: ([9, 1, 7], 1)
Expected Output: (10, [[1, 7, 9]])
Explanation: With only one batch, all documents are padded to 9. Waste is `3 * 9 - (1 + 7 + 9) = 10`.
Input: ([12, 2, 10, 3], 2)
Expected Output: (3, [[2, 3], [10, 12]])
Explanation: After sorting, splitting into `[2, 3]` and `[10, 12]` gives waste `1 + 2 = 3`, which is optimal.
Input: ([8, 5, 5], 2)
Expected Output: (0, [[5, 5], [8]])
Explanation: The two 5s can share a batch with no waste, and 8 alone also has no waste.
Input: ([4, 1, 9], 5)
Expected Output: (0, [[1], [4], [9]])
Explanation: You may use up to 5 batches, but only 3 documents exist, so each can be placed alone for zero waste.
Input: ([8, 1, 4, 9, 10], 3)
Expected Output: (3, [[1], [4], [8, 9, 10]])
Explanation: After sorting to `[1, 4, 8, 9, 10]`, the best 3-batch split is `[1]`, `[4]`, and `[8, 9, 10]`, for waste `0 + 0 + (3 * 10 - 27) = 3`.
Hints
- Because reordering is allowed, sort the lengths first. In an optimal solution, each batch can be taken as a contiguous segment of the sorted array.
- Use prefix sums to compute the waste of any sorted segment in O(1). Then define a DP over prefixes and number of batches; after algebraic rearrangement, each transition becomes a minimum over lines queried at `x = current_length`.