Implement a Batch Image Processor
Company: Anthropic
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
Quick Answer: This question evaluates a candidate's competency in image processing operations and parallel/concurrent programming, covering basic transformations (grayscale, scale, resize) and considerations for CPU-bound batch processing.
Part 1: Process a Small Batch of Image Tasks
Constraints
- 0 <= len(tasks) <= 10^4
- 1 <= width, height <= 10^6
- Each task has 0 to 50 operations
- In this part, all operations are valid: scale factors are positive and resize dimensions are positive integers
Examples
Input: ([{'input_path': 'imgs/cat.png', 'width': 100, 'height': 80, 'mode': 'RGB', 'operations': [('grayscale',), ('scale', 0.5)]}, {'input_path': 'dog.jpg', 'width': 20, 'height': 30, 'mode': 'RGB', 'operations': [('resize', 15, 15)]}], 'out')
Expected Output: [('out/cat_out.png', 50, 40, 'L'), ('out/dog_out.jpg', 15, 15, 'RGB')]
Explanation: The first image becomes grayscale and is scaled down. The second is resized exactly to 15 by 15.
Input: ([{'input_path': 'bird', 'width': 7, 'height': 9, 'mode': 'L', 'operations': []}], 'results')
Expected Output: [('results/bird_out', 7, 9, 'L')]
Explanation: With no operations, the metadata is unchanged. Since there is no extension, '_out' is appended directly.
Input: ([], 'out')
Expected Output: []
Explanation: Edge case: an empty batch produces an empty result list.
Input: ([{'input_path': 'raw/a.bmp', 'width': 3, 'height': 3, 'mode': 'RGB', 'operations': [('scale', 0.2), ('grayscale',), ('resize', 2, 5)]}], 'done')
Expected Output: [('done/a_out.bmp', 2, 5, 'L')]
Explanation: Scaling 3 by 0.2 gives 0.6, floor makes it 0, and clamping makes it 1. After grayscale, resize sets the final size to 2 by 5.
Hints
- Process each image independently and simulate the operations one by one.
- For scale, apply floor after multiplication, then clamp each dimension with max(1, value).
Part 2: Simulate an Order-Preserving Parallel Image Processor
Constraints
- 0 <= len(tasks) <= 2 * 10^5
- 1 <= workers <= 10^5
- 1 <= width, height <= 10^6
- Each task has 0 to 50 operations
- In this part, operations may be invalid and must be reported per task instead of stopping the whole batch
Examples
Input: ([{'input_path': 'a.png', 'width': 10, 'height': 10, 'mode': 'RGB', 'operations': [('grayscale',), ('scale', 0.5)]}, {'input_path': 'b.jpg', 'width': 4, 'height': 5, 'mode': 'RGB', 'operations': [('resize', 2, 3)]}, {'input_path': 'c.bmp', 'width': 6, 'height': 6, 'mode': 'L', 'operations': []}], 'out', 2)
Expected Output: [('ok', 200, 'out/a_out.png', 5, 5, 'L', None), ('ok', 6, 'out/b_out.jpg', 2, 3, 'RGB', None), ('ok', 6, 'out/c_out.bmp', 6, 6, 'L', None)]
Explanation: Task 1 takes 100 + 100 = 200 time units. Task 2 takes 2 * 3 = 6. Task 3 has no operations, so once worker 1 becomes free at time 6, it finishes immediately at time 6.
Input: ([{'input_path': 'x.png', 'width': 8, 'height': 8, 'mode': 'RGB', 'operations': [('scale', 0.25)]}, {'input_path': 'y.png', 'width': 5, 'height': 5, 'mode': 'RGB', 'operations': [('resize', 0, 4)]}, {'input_path': 'z.png', 'width': 3, 'height': 7, 'mode': 'RGB', 'operations': [('grayscale',)]}], 'res', 2)
Expected Output: [('ok', 64, 'res/x_out.png', 2, 2, 'RGB', None), ('error', 0, None, None, None, None, 'invalid resize'), ('ok', 21, 'res/z_out.png', 3, 7, 'L', None)]
Explanation: The second task fails immediately, so its worker is free again at time 0 and can start the third task without affecting the first task.
Input: ([{'input_path': 'm.png', 'width': 2, 'height': 2, 'mode': 'RGB', 'operations': [('resize', 4, 4), ('scale', 0.5)]}, {'input_path': 'n', 'width': 1, 'height': 9, 'mode': 'RGB', 'operations': [('grayscale',), ('scale', 0.1)]}], 'out', 1)
Expected Output: [('ok', 32, 'out/m_out.png', 2, 2, 'RGB', None), ('ok', 50, 'out/n_out', 1, 1, 'L', None)]
Explanation: With one worker, tasks run sequentially. The first costs 16 + 16 = 32, so the second starts at time 32 and finishes at time 50.
Input: ([{'input_path': 'p.png', 'width': 5, 'height': 5, 'mode': 'RGB', 'operations': [('grayscale',), ('scale', -1.0)]}, {'input_path': 'q.png', 'width': 10, 'height': 1, 'mode': 'RGB', 'operations': [('resize', 1, 1)]}], 'tmp', 1)
Expected Output: [('error', 25, None, None, None, None, 'invalid scale'), ('ok', 26, 'tmp/q_out.png', 1, 1, 'RGB', None)]
Explanation: The first task spends 25 time units on grayscale, then fails before the invalid scale. The next task starts only after time 25 because there is just one worker.
Input: ([], 'out', 4)
Expected Output: []
Explanation: Edge case: no tasks means no results.
Hints
- First write a helper that simulates one task: compute its duration, final metadata, or failure.
- Then use a min-heap of (available_time, worker_id) to assign each next task to the earliest available worker in O(log workers).