Implement Parallel Image Processing
Company: Anthropic
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Onsite
Quick Answer: This question evaluates competency in concurrent and parallel processing, practical image manipulation with Python libraries (Pillow), resource management for memory and file I/O, robust error handling, and performance measurement.
Part 1: Sequential Grayscale and Resize Utility
Constraints
- 0 <= len(images) <= 100
- 1 <= target_h, target_w <= 100
- For valid images, each pixel is a 3-element tuple/list of integers in [0, 255]
- The sum of all source pixels across all images is at most 200000
Examples
Input: ([('a', [[(255, 0, 0), (0, 255, 0)], [(0, 0, 255), (255, 255, 255)]])], (2, 2))
Expected Output: [('a_processed', [[76, 149], [29, 255]])]
Explanation: A basic same-size transformation: grayscale values are computed directly and no scaling changes positions.
Input: ([('wide', [[(10, 20, 30), (30, 20, 10)]])], (2, 4))
Expected Output: [('wide_processed', [[18, 18, 21, 21], [18, 18, 21, 21]])]
Explanation: The 1x2 image is enlarged to 2x4 with nearest-neighbor duplication.
Input: ([('grid', [[(0, 0, 0), (50, 50, 50), (100, 100, 100)], [(150, 150, 150), (200, 200, 200), (250, 250, 250)], [(30, 30, 30), (60, 60, 60), (90, 90, 90)]])], (2, 2))
Expected Output: [('grid_processed', [[0, 50], [150, 200]])]
Explanation: This checks shrinking from 3x3 to 2x2 using nearest-neighbor source selection.
Input: ([('one', [[(10, 10, 10)]]), ('ragged', [[(0, 0, 0)], [(255, 255, 255), (0, 0, 0)]])], (2, 2))
Expected Output: [('one_processed', [[10, 10], [10, 10]]), ('ragged_processed', None)]
Explanation: The first image is valid and expands correctly. The second is ragged, so it returns None.
Input: ([], (3, 3))
Expected Output: []
Explanation: Edge case: no images to process.
Hints
- First validate that the image is non-empty and rectangular before processing it.
- For nearest-neighbor resizing, map each destination cell back to a source cell with integer division.
Part 2: Parallel Processing Planner with Checksums
Constraints
- 0 <= len(images) <= 100000
- 1 <= target_h, target_w <= 100
- 1 <= max_workers <= 100000
- For valid images, each pixel is a 3-element tuple/list of integers in [0, 255]
- The sum of all source pixels across all images is at most 200000
Examples
Input: ([('a', [[(255, 0, 0)]]), ('b', [[(0, 255, 0)]])], (1, 1), 2)
Expected Output: {'results': [('a_processed', 76), ('b_processed', 149)], 'estimated_time': 2, 'peak_inflight_pixels': 2}
Explanation: Both 1x1 images can run immediately on separate workers. Each takes 1 + 1 = 2 time units.
Input: ([('a', [[(10, 10, 10)]]), ('b', [[(20, 20, 20), (30, 30, 30)]]), ('c', [])], (2, 2), 1)
Expected Output: {'results': [('a_processed', 40), ('b_processed', 100), ('c_processed', None)], 'estimated_time': 11, 'peak_inflight_pixels': 2}
Explanation: With one worker, valid images run sequentially. The invalid image contributes no work.
Input: ([('x', [[(0, 0, 0), (0, 0, 0)], [(0, 0, 0), (0, 0, 0)]]), ('y', [[(100, 100, 100), (100, 100, 100), (100, 100, 100)]]), ('z', [[(255, 255, 255)]])], (1, 1), 2)
Expected Output: {'results': [('x_processed', 0), ('y_processed', 100), ('z_processed', 255)], 'estimated_time': 6, 'peak_inflight_pixels': 7}
Explanation: The first two jobs start at time 0. The third starts when the shorter of those two finishes.
Input: ([('bad', [[(0, 0, 0)], []]), ('also_bad', [])], (1, 1), 3)
Expected Output: {'results': [('bad_processed', None), ('also_bad_processed', None)], 'estimated_time': 0, 'peak_inflight_pixels': 0}
Explanation: Edge case: all images are invalid, so no work is scheduled.
Input: ([], (1, 1), 4)
Expected Output: {'results': [], 'estimated_time': 0, 'peak_inflight_pixels': 0}
Explanation: Edge case: empty input.
Hints
- Reuse the grayscale and nearest-neighbor logic from the sequential version, but sum the resized values instead of storing the final image.
- To estimate parallel time, assign each job to the earliest available worker with a min-heap, then sweep start/end events to get peak in-flight memory.