Choose container set to minimize waste
Company: Imc
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: hard
Interview Round: Take-home Project
##### Question
A pharmaceutical company is packaging liquid medicine for shipment. Each customer order has a required volume. The company can choose **exactly one** "container set" and must then fulfill **all** orders using containers **only from that chosen set**.
Each container set defines the capacities of containers available in **unlimited quantity**. For each order you must:
1. Use exactly **one** container from the chosen set.
2. Fill that container **completely** up to the order amount.
3. If a container with capacity **exactly equal** to the order amount exists, use it (waste = 0). Otherwise, use the **smallest** container whose capacity is **greater than** the order amount.
4. The **waste** for an order is `capacity - order_amount`.
If, for some order, **no** container in a set is large enough (no capacity `>= requirement`), then that set is **invalid** and cannot be used at all.
Evaluate every container set and choose the one that:
- Can satisfy **all** orders, **and**
- Has the **minimum total waste**, where total waste is the sum of wasted units over all orders.
If multiple container sets tie on minimum total waste, return the **smallest index** among them. If **no** container set can satisfy all orders, return `-1`.
---
### Input format
The required input is provided to you in one of two equivalent forms (handle whichever your harness gives):
**Form A — flat 2D array grouped by set:**
- An integer `n`: the number of orders.
- An array `requirements[n]`: positive integers where `requirements[i]` is the required volume for the `i`-th order.
- An integer `numContainerSets`: the number of distinct container sets.
- A 2D array `containers[totalNumContainers][2]`, where `containers[k][0]` is the **set index** (`0` to `numContainerSets - 1`) and `containers[k][1]` is a **container capacity** (positive integer) belonging to that set.
- Rows are grouped by set index in increasing order of index.
- Within each set, capacities are listed in **non-decreasing** order.
**Form B — list of capacity arrays:**
- `requirements[]` (length `m`): the volume required for each order.
- `containerSets`: a list where `containerSets[i]` is an array of the container capacities available in set `i`. Capacities may appear in any order and duplicates may exist.
### Constraints
- `1 <= n <= 10^5`
- `1 <= numContainerSets <= 10^5`
- `1 <= totalNumContainers <= 2 * 10^5`
- `1 <= requirements[i] <= 10^9`
- `1 <= capacity <= 10^9`
Your algorithm must be efficient enough to handle inputs at the upper bounds. A naive solution that is `O(n * totalNumContainers)` or worse will time out.
---
### Required function
```text
int chooseContainers(int requirements[n],
int numContainerSets,
int containers[totalNumContainers][2])
```
**Return value:**
- The **0-based index** of the container set with the minimum total waste that satisfies all requirements (smallest index on ties), or
- `-1` if no container set can satisfy all requirements.
---
### Example
```text
n = 4
requirements = [4, 6, 6, 7]
numContainerSets = 3
containers = [
[0, 3], [0, 5], [0, 7], // set 0: capacities {3, 5, 7}
[1, 6], [1, 8], [1, 9], // set 1: capacities {6, 8, 9}
[2, 3], [2, 5], [2, 6] // set 2: capacities {3, 5, 6}
]
```
- **Set 0** `{3, 5, 7}`: 4->5 (waste 1), 6->7 (1), 6->7 (1), 7->7 (0) => total **3**.
- **Set 1** `{6, 8, 9}`: 4->6 (waste 2), 6->6 (0), 6->6 (0), 7->8 (1) => total **3**.
- **Set 2** `{3, 5, 6}`: cannot satisfy requirement 7 (max capacity 6 < 7) => **invalid**.
Set 0 and Set 1 both achieve the minimum total waste of 3, so return the smaller index:
```text
chooseContainers(...) = 0
```
Quick Answer: An IMC software engineer take-home asking you to pick the single container set that fulfills all medicine orders with minimum total waste. It tests ceiling-style binary search, per-set validity checks, smallest-index tie-breaking, and efficient handling of inputs up to 10^5 to avoid an O(n*T) timeout.
Solution
## Approach
For each order of size `r` under a fixed set, the chosen container is the **smallest capacity `c` with `c >= r`** (a ceiling lookup). If no such capacity exists for some order, the set is invalid. The waste of a set is the sum of `c - r` over all orders; we want the valid set with the minimum total waste, breaking ties by the smallest index, and `-1` if no set is valid.
The ceiling lookup is a classic **binary search** on a sorted list of capacities (`lower_bound` / `bisect_left`). So per set, sort its distinct capacities once, then binary-search each requirement.
### Complexity
Let `S` be the number of sets, `T` the total number of capacities across all sets, and `n` the number of orders.
- Sorting every set's capacities costs `O(T log T)` overall (each capacity sorted exactly once within its own set).
- For each set we binary-search all `n` requirements: `O(n log K)` per set, where `K` is that set's size. Across all sets this is `O(S * n * log(max K))`.
- Total: roughly `O(T log T + S * n * log K)`. This comfortably handles the stated bounds and avoids the `O(n * T)` naive blow-up because each lookup is logarithmic, not linear.
Use 64-bit integers for the running total: with `n` up to `10^5` orders and per-order waste up to ~`10^9`, total waste can reach ~`10^14`, which overflows 32-bit.
### Reference implementation (Python)
```python
import bisect
def chooseContainers(requirements, numContainerSets, containers):
# Group capacities by set index (Form A). For Form B, sets are already grouped.
sets = [[] for _ in range(numContainerSets)]
for set_idx, cap in containers:
sets[set_idx].append(cap)
best_idx = -1
best_waste = None # use None so any valid set beats "no set yet"
for i, caps in enumerate(sets):
if not caps:
continue
caps = sorted(set(caps)) # dedupe + sort once
max_cap = caps[-1]
total = 0
valid = True
for r in requirements:
if r > max_cap: # early reject: no container big enough
valid = False
break
pos = bisect.bisect_left(caps, r) # smallest cap >= r
total += caps[pos] - r
if valid and (best_waste is None or total < best_waste):
best_waste = total
best_idx = i # strict `<` keeps the smallest index on ties
return best_idx
```
### Correctness notes
- **Ceiling, not floor:** `bisect_left(caps, r)` returns the first index whose capacity is `>= r`, which is exactly the smallest container that can hold the order (and gives waste 0 when an exact match exists).
- **Tie-break:** iterate sets in increasing index order and update only on a strict improvement (`total < best_waste`), so the earliest set wins a tie.
- **Invalidity:** if any requirement exceeds the set's largest capacity, the set is rejected and contributes no candidate.
- **No valid set:** `best_idx` stays `-1`.
### Optional optimization
If there are many orders and many sets, precompute the **distinct sorted requirements with their counts** (frequency map). Then per set you binary-search each *distinct* requirement once and multiply its waste by its count, reducing the inner loop from `n` to the number of distinct requirement values.
Explanation
Per order, the optimal container is the smallest capacity >= the requirement (a ceiling found by binary search). Sum waste over all orders for each set, reject any set missing a large-enough container for some order, and return the valid set with minimum total waste (smallest index on ties, -1 if none). Key gotchas: binary search for the ceiling (not floor), 64-bit accumulator to avoid overflow, and strict-less-than comparison to preserve the smallest-index tie-break.