PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches

Quick Overview

This question evaluates competency in concurrent and parallel processing, practical image manipulation with Python libraries (Pillow), resource management for memory and file I/O, robust error handling, and performance measurement.

  • medium
  • Anthropic
  • Coding & Algorithms
  • Software Engineer

Implement Parallel Image Processing

Company: Anthropic

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Onsite

Build an image-processing utility in Python using the Pillow library. You are given a collection of image file paths and an output directory. For each image, apply the following transformations: 1. Convert the image to grayscale. 2. Scale or resize the image to a target size while preserving correctness. 3. Save the processed image to the output directory. Start with a solution that works for a small number of small images. Then extend the solution so it can efficiently process many large images. Discuss and implement an approach using parallelism or concurrency, such as `ProcessPoolExecutor`, and explain why it helps or does not help for this workload. Consider error handling, memory usage, file I/O, and how you would measure performance.

Quick Answer: This question evaluates competency in concurrent and parallel processing, practical image manipulation with Python libraries (Pillow), resource management for memory and file I/O, robust error handling, and performance measurement.

Part 1: Sequential Grayscale and Resize Utility

You are given a list of images. Each image is represented as a pair (name, pixels), where pixels is a 2D list of RGB tuples (r, g, b). Process the images sequentially. For every valid image: 1. Convert each pixel to grayscale using gray = (299*r + 587*g + 114*b) // 1000. 2. Resize the grayscale image to the target size (target_h, target_w) using nearest-neighbor mapping: - src_i = i * src_h // target_h - src_j = j * src_w // target_w 3. Return the processed image under the name name + "_processed". If an image is invalid, return None for that image instead of a matrix. An image is invalid if it is empty, has empty rows, is ragged (rows with different lengths), contains malformed pixels, or contains a channel outside 0..255. Preserve input order.

Constraints

  • 0 <= len(images) <= 100
  • 1 <= target_h, target_w <= 100
  • For valid images, each pixel is a 3-element tuple/list of integers in [0, 255]
  • The sum of all source pixels across all images is at most 200000

Examples

Input: ([('a', [[(255, 0, 0), (0, 255, 0)], [(0, 0, 255), (255, 255, 255)]])], (2, 2))

Expected Output: [('a_processed', [[76, 149], [29, 255]])]

Explanation: A basic same-size transformation: grayscale values are computed directly and no scaling changes positions.

Input: ([('wide', [[(10, 20, 30), (30, 20, 10)]])], (2, 4))

Expected Output: [('wide_processed', [[18, 18, 21, 21], [18, 18, 21, 21]])]

Explanation: The 1x2 image is enlarged to 2x4 with nearest-neighbor duplication.

Input: ([('grid', [[(0, 0, 0), (50, 50, 50), (100, 100, 100)], [(150, 150, 150), (200, 200, 200), (250, 250, 250)], [(30, 30, 30), (60, 60, 60), (90, 90, 90)]])], (2, 2))

Expected Output: [('grid_processed', [[0, 50], [150, 200]])]

Explanation: This checks shrinking from 3x3 to 2x2 using nearest-neighbor source selection.

Input: ([('one', [[(10, 10, 10)]]), ('ragged', [[(0, 0, 0)], [(255, 255, 255), (0, 0, 0)]])], (2, 2))

Expected Output: [('one_processed', [[10, 10], [10, 10]]), ('ragged_processed', None)]

Explanation: The first image is valid and expands correctly. The second is ragged, so it returns None.

Input: ([], (3, 3))

Expected Output: []

Explanation: Edge case: no images to process.

Hints

  1. First validate that the image is non-empty and rectangular before processing it.
  2. For nearest-neighbor resizing, map each destination cell back to a source cell with integer division.

Part 2: Parallel Processing Planner with Checksums

You are given the same image representation as in Part 1, plus a target size and a worker count max_workers. For each valid image, compute the checksum of its processed version instead of returning the full matrix. The checksum is the sum of all values in the resized grayscale image. Use the same rules as Part 1: 1. Grayscale formula: gray = (299*r + 587*g + 114*b) // 1000 2. Resize with nearest-neighbor mapping: - src_i = i * src_h // target_h - src_j = j * src_w // target_w Invalid images return None as their checksum. Then estimate a ProcessPoolExecutor-style schedule without launching real processes: - Only valid images are scheduled. - Images are submitted in input order. - There are max_workers identical workers. - Each valid image takes duration = src_h * src_w + target_h * target_w time units. - When a worker becomes free, it immediately starts the next waiting valid image. - Each running valid image uses memory equal to its source pixel count src_h * src_w. Return: - results: a list of (name + "_processed", checksum_or_None) in input order - estimated_time: total completion time of the simulated schedule - peak_inflight_pixels: the maximum sum of source pixels of valid images running at the same time This models why process-based parallelism can help CPU-heavy image workloads, while still keeping the judge deterministic and self-contained.

Constraints

  • 0 <= len(images) <= 100000
  • 1 <= target_h, target_w <= 100
  • 1 <= max_workers <= 100000
  • For valid images, each pixel is a 3-element tuple/list of integers in [0, 255]
  • The sum of all source pixels across all images is at most 200000

Examples

Input: ([('a', [[(255, 0, 0)]]), ('b', [[(0, 255, 0)]])], (1, 1), 2)

Expected Output: {'results': [('a_processed', 76), ('b_processed', 149)], 'estimated_time': 2, 'peak_inflight_pixels': 2}

Explanation: Both 1x1 images can run immediately on separate workers. Each takes 1 + 1 = 2 time units.

Input: ([('a', [[(10, 10, 10)]]), ('b', [[(20, 20, 20), (30, 30, 30)]]), ('c', [])], (2, 2), 1)

Expected Output: {'results': [('a_processed', 40), ('b_processed', 100), ('c_processed', None)], 'estimated_time': 11, 'peak_inflight_pixels': 2}

Explanation: With one worker, valid images run sequentially. The invalid image contributes no work.

Input: ([('x', [[(0, 0, 0), (0, 0, 0)], [(0, 0, 0), (0, 0, 0)]]), ('y', [[(100, 100, 100), (100, 100, 100), (100, 100, 100)]]), ('z', [[(255, 255, 255)]])], (1, 1), 2)

Expected Output: {'results': [('x_processed', 0), ('y_processed', 100), ('z_processed', 255)], 'estimated_time': 6, 'peak_inflight_pixels': 7}

Explanation: The first two jobs start at time 0. The third starts when the shorter of those two finishes.

Input: ([('bad', [[(0, 0, 0)], []]), ('also_bad', [])], (1, 1), 3)

Expected Output: {'results': [('bad_processed', None), ('also_bad_processed', None)], 'estimated_time': 0, 'peak_inflight_pixels': 0}

Explanation: Edge case: all images are invalid, so no work is scheduled.

Input: ([], (1, 1), 4)

Expected Output: {'results': [], 'estimated_time': 0, 'peak_inflight_pixels': 0}

Explanation: Edge case: empty input.

Hints

  1. Reuse the grayscale and nearest-neighbor logic from the sequential version, but sum the resized values instead of storing the final image.
  2. To estimate parallel time, assign each job to the earliest available worker with a min-heap, then sweep start/end events to get peak in-flight memory.
Last updated: May 5, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Convert Samples into Event Intervals - Anthropic (medium)
  • Convert State Stream to Events - Anthropic (medium)
  • Build a concurrent web crawler - Anthropic (medium)
  • Implement a Parallel Image Processor - Anthropic (medium)
  • Implement a Batch Image Processor - Anthropic (medium)