How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a hard difficulty Coding & Algorithms question, commonly asked during Technical Screen rounds at Amazon.

What role is this question designed for?

This question is commonly asked for Machine Learning Engineer candidates at Amazon during technical interviews.

Implement Optimal Bucket Batching | Amazon Coding Question

Implement Optimal Bucket Batching

Company: Amazon

Role: Machine Learning Engineer

Category: Coding & Algorithms

Difficulty: hard

Interview Round: Technical Screen

You are given an array `lengths` of `K` document lengths and an integer `G` representing the number of available GPUs. A batch is assigned to one GPU, and every document in the same batch is padded to the maximum document length in that batch. Reordering documents is allowed. Choose up to `G` non-empty batches, with unused GPUs allowed, to minimize the total padding waste: `sum over batches of (batch_size * max_length_in_batch - sum_of_lengths_in_batch)` Return both the minimum padding waste and one optimal partition of the documents into batches. If `K = 0`, return zero waste and an empty partition. If `G >= K`, the optimal solution may place each document in its own batch, producing zero padding waste. Implement an efficient algorithm and explain its time and space complexity.

Quick Answer: This question evaluates algorithm design and optimization skills, focusing on combinatorial partitioning and batching strategies, efficient utilization of parallel GPUs, and time/space complexity analysis relevant to ML data pipelines.

You are given an array `lengths` of document lengths and an integer `G` representing the number of available GPUs. You may reorder the documents, then split them into up to `G` non-empty batches. Every document in the same batch is padded to the maximum length in that batch, so a batch with lengths `[x1, x2, ..., xm]` creates padding waste `m * max(batch) - sum(batch)`. Return a tuple `(minimum_waste, partition)`, where `partition` is one optimal list of batches. Each batch should be returned as a list of lengths. Because reordering is allowed, returning batches in sorted order is acceptable. If multiple optimal partitions exist, return any one of them. If `lengths` is empty, return `(0, [])`.

Constraints

0 <= K == len(lengths) <= 5000
1 <= G <= 500 when K > 0
0 <= lengths[i] <= 10^9

Examples

Input: ([], 4)

Expected Output: (0, [])

Explanation: There are no documents, so there is no waste and no batches.

Input: ([7], 2)

Expected Output: (0, [[7]])

Explanation: A single document needs no padding.

Input: ([9, 1, 7], 1)

Expected Output: (10, [[1, 7, 9]])

Explanation: With only one batch, all documents are padded to 9. Waste is `3 * 9 - (1 + 7 + 9) = 10`.

Input: ([12, 2, 10, 3], 2)

Expected Output: (3, [[2, 3], [10, 12]])

Explanation: After sorting, splitting into `[2, 3]` and `[10, 12]` gives waste `1 + 2 = 3`, which is optimal.

Input: ([8, 5, 5], 2)

Expected Output: (0, [[5, 5], [8]])

Explanation: The two 5s can share a batch with no waste, and 8 alone also has no waste.

Input: ([4, 1, 9], 5)

Expected Output: (0, [[1], [4], [9]])

Explanation: You may use up to 5 batches, but only 3 documents exist, so each can be placed alone for zero waste.

Input: ([8, 1, 4, 9, 10], 3)

Expected Output: (3, [[1], [4], [8, 9, 10]])

Explanation: After sorting to `[1, 4, 8, 9, 10]`, the best 3-batch split is `[1]`, `[4]`, and `[8, 9, 10]`, for waste `0 + 0 + (3 * 10 - 27) = 3`.

Hints

Because reordering is allowed, sort the lengths first. In an optimal solution, each batch can be taken as a contiguous segment of the sorted array.
Use prefix sums to compute the waste of any sorted segment in O(1). Then define a DP over prefixes and number of batches; after algebraic rearrangement, each transition becomes a minimum over lines queried at `x = current_length`.

Quick Overview

This question evaluates algorithm design and optimization skills, focusing on combinatorial partitioning and batching strategies, efficient utilization of parallel GPUs, and time/space complexity analysis relevant to ML data pipelines.