PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates algorithm design and optimization skills, focusing on combinatorial partitioning and batching strategies, efficient utilization of parallel GPUs, and time/space complexity analysis relevant to ML data pipelines.

  • hard
  • Amazon
  • Coding & Algorithms
  • Machine Learning Engineer

Implement Optimal Bucket Batching

Company: Amazon

Role: Machine Learning Engineer

Category: Coding & Algorithms

Difficulty: hard

Interview Round: Technical Screen

You are given an array `lengths` of `K` document lengths and an integer `G` representing the number of available GPUs. A batch is assigned to one GPU, and every document in the same batch is padded to the maximum document length in that batch. Reordering documents is allowed. Choose up to `G` non-empty batches, with unused GPUs allowed, to minimize the total padding waste: `sum over batches of (batch_size * max_length_in_batch - sum_of_lengths_in_batch)` Return both the minimum padding waste and one optimal partition of the documents into batches. If `K = 0`, return zero waste and an empty partition. If `G >= K`, the optimal solution may place each document in its own batch, producing zero padding waste. Implement an efficient algorithm and explain its time and space complexity.

Quick Answer: This question evaluates algorithm design and optimization skills, focusing on combinatorial partitioning and batching strategies, efficient utilization of parallel GPUs, and time/space complexity analysis relevant to ML data pipelines.

You are given an array `lengths` of document lengths and an integer `G` representing the number of available GPUs. You may reorder the documents, then split them into up to `G` non-empty batches. Every document in the same batch is padded to the maximum length in that batch, so a batch with lengths `[x1, x2, ..., xm]` creates padding waste `m * max(batch) - sum(batch)`. Return a tuple `(minimum_waste, partition)`, where `partition` is one optimal list of batches. Each batch should be returned as a list of lengths. Because reordering is allowed, returning batches in sorted order is acceptable. If multiple optimal partitions exist, return any one of them. If `lengths` is empty, return `(0, [])`.

Constraints

  • 0 <= K == len(lengths) <= 5000
  • 1 <= G <= 500 when K > 0
  • 0 <= lengths[i] <= 10^9

Examples

Input: ([], 4)

Expected Output: (0, [])

Explanation: There are no documents, so there is no waste and no batches.

Input: ([7], 2)

Expected Output: (0, [[7]])

Explanation: A single document needs no padding.

Input: ([9, 1, 7], 1)

Expected Output: (10, [[1, 7, 9]])

Explanation: With only one batch, all documents are padded to 9. Waste is `3 * 9 - (1 + 7 + 9) = 10`.

Input: ([12, 2, 10, 3], 2)

Expected Output: (3, [[2, 3], [10, 12]])

Explanation: After sorting, splitting into `[2, 3]` and `[10, 12]` gives waste `1 + 2 = 3`, which is optimal.

Input: ([8, 5, 5], 2)

Expected Output: (0, [[5, 5], [8]])

Explanation: The two 5s can share a batch with no waste, and 8 alone also has no waste.

Input: ([4, 1, 9], 5)

Expected Output: (0, [[1], [4], [9]])

Explanation: You may use up to 5 batches, but only 3 documents exist, so each can be placed alone for zero waste.

Input: ([8, 1, 4, 9, 10], 3)

Expected Output: (3, [[1], [4], [8, 9, 10]])

Explanation: After sorting to `[1, 4, 8, 9, 10]`, the best 3-batch split is `[1]`, `[4]`, and `[8, 9, 10]`, for waste `0 + 0 + (3 * 10 - 27) = 3`.

Hints

  1. Because reordering is allowed, sort the lengths first. In an optimal solution, each batch can be taken as a contiguous segment of the sorted array.
  2. Use prefix sums to compute the waste of any sorted segment in O(1). Then define a DP over prefixes and number of batches; after algebraic rearrangement, each transition becomes a minimum over lines queried at `x = current_length`.
Last updated: May 5, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Implement Datacenter Router Commands - Amazon (hard)
  • Implement Event Filtering and Queue Routing - Amazon (medium)
  • Determine if all courses can be completed - Amazon (medium)
  • Replace Delimited Tokens in a String - Amazon (medium)
  • Minimize Circular Redistribution Cost - Amazon (medium)