How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a medium difficulty Coding & Algorithms question, commonly asked during Technical Screen rounds at Anthropic.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Anthropic during technical interviews.

Implement a Parallel Image Processor | Anthropic Coding Question

Quick Overview

This question evaluates understanding of parallel and process-based concurrency, batch image-processing pipelines, transform correctness, and per-job fault isolation within the Coding & Algorithms domain.

Implement a Parallel Image Processor

Company: Anthropic

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

You are building a batch image-processing utility. Implement a function that receives a list of jobs. Each job contains: - `input_path`: the source image file - `output_path`: where to write the processed image - `operations`: an ordered list of image transformations Support these operations: 1. `grayscale` — convert the image to grayscale. 2. `scale(f)` — multiply both width and height by a positive scaling factor `f`. 3. `resize(w, h)` — resize the image to exactly `w x h` pixels. For each job, load the image, apply the operations in order, and save the final result to `output_path`. Follow-up: 1. First, implement a correct solution for a small number of images. 2. Then improve the implementation for a very large batch. Assume image transformations are CPU-intensive and that jobs are independent. Use process-based parallelism to increase throughput, while still returning one result record per input job and handling per-file failures without stopping the entire batch. Explain any important design choices, such as worker-count selection, memory considerations, and how you would preserve correctness while improving performance.

Quick Answer: This question evaluates understanding of parallel and process-based concurrency, batch image-processing pipelines, transform correctness, and per-job fault isolation within the Coding & Algorithms domain.

Part 1: Sequential Virtual Image Processor

You are given a virtual file system of images. Each image is stored only as metadata `(width, height, mode)`, where `mode` is either `'RGB'` or `'GRAY'`. Implement a correct sequential batch processor for a small number of jobs. Each job contains: - `input_path`: the path to read - `output_path`: the path to write - `operations`: an ordered list of transformations Supported operations: - `('grayscale',)` -> change the mode to `'GRAY'` - `('scale', f)` -> set `width = int(width * f)` and `height = int(height * f)` - `('resize', w, h)` -> set the size to exactly `(w, h)` Jobs run strictly in order. A successful job writes its output into the file system, so later jobs may use earlier outputs as inputs. If a job fails, it must not modify the file system, and processing continues with the next job. A job fails with: - `'missing_input'` if `input_path` does not exist at the time the job starts - `'invalid_operation'` if an operation is unknown or malformed - `'invalid_dimensions'` if scaling/resizing would produce a width or height less than 1, or if the scale factor is not positive

Constraints

`0 <= len(jobs) <= 1000`
`0 <= len(operations) <= 20` per job
All input images use only `'RGB'` or `'GRAY'` modes

Loading coding console...

Examples

Input: ({'a': (10, 20, 'RGB')}, [{'input_path': 'a', 'output_path': 'b', 'operations': [('grayscale',), ('scale', 0.5)]}, {'input_path': 'b', 'output_path': 'c', 'operations': [('resize', 4, 6)]}])

Expected Output: ([('ok', 'b', (5, 10, 'GRAY')), ('ok', 'c', (4, 6, 'GRAY'))], {'a': (10, 20, 'RGB'), 'b': (5, 10, 'GRAY'), 'c': (4, 6, 'GRAY')})

Explanation: The second job reads the file produced by the first job.

Input: ({}, [{'input_path': 'missing', 'output_path': 'out', 'operations': [('grayscale',)]}])

Expected Output: ([('error', 'missing_input')], {})

Explanation: The input file does not exist.

Input: ({'x': (3, 3, 'RGB')}, [{'input_path': 'x', 'output_path': 'y', 'operations': [('scale', 2), ('rotate', 90)]}])

Expected Output: ([('error', 'invalid_operation')], {'x': (3, 3, 'RGB')})

Explanation: An unknown operation causes the whole job to fail, and `y` is not written.

Input: ({'p': (1, 1, 'RGB')}, [{'input_path': 'p', 'output_path': 'q', 'operations': [('scale', 0.4)]}, {'input_path': 'p', 'output_path': 'r', 'operations': [('resize', 2, 2)]}])

Expected Output: ([('error', 'invalid_dimensions'), ('ok', 'r', (2, 2, 'RGB'))], {'p': (1, 1, 'RGB'), 'r': (2, 2, 'RGB')})

Explanation: Scaling by `0.4` would make the image `0 x 0`, so the first job fails. The second job still succeeds.

Input: ({'solo': (7, 8, 'GRAY')}, [])

Expected Output: ([], {'solo': (7, 8, 'GRAY')})

Explanation: No jobs means the file system is unchanged.

Part 2: Parallel Virtual Image Processor with Failure Isolation

You now need to handle a very large batch of independent image-processing jobs. Each image is still represented only by metadata `(width, height, mode)`, where `mode` is `'RGB'` or `'GRAY'`. Unlike Part 1, every job must read from the original `files` snapshot only. No job may depend on the output of another job. Implement a process-based batch processor that: - uses process parallelism for throughput - returns one result record per input job - preserves the original job order in the returned results - isolates failures so one bad job does not stop the rest - returns a deterministic final file map Transformations are the same: - `('grayscale',)` - `('scale', f)` -> `width = int(width * f)`, `height = int(height * f)` - `('resize', w, h)` Failure rules are also the same: - `'missing_input'` - `'invalid_operation'` - `'invalid_dimensions'` To keep the final file map deterministic, successful outputs are committed in input order after processing completes. If multiple successful jobs write the same `output_path`, the later job in the input list wins.

Constraints

`0 <= len(jobs) <= 100000`
`0 <= len(operations) <= 20` per job
Jobs are independent and must read only from the original `files` map
If `max_workers` is `None`, a typical choice is `min(cpu_count, number_of_jobs)`
If two successful jobs write the same `output_path`, the later job in input order wins in `final_files`

Examples

Input: ({'a': (10, 20, 'RGB')}, [{'input_path': 'a', 'output_path': 'b', 'operations': [('grayscale',)]}, {'input_path': 'b', 'output_path': 'c', 'operations': [('resize', 4, 4)]}], 2)

Expected Output: ([('ok', 'b', (10, 20, 'GRAY')), ('error', 'missing_input')], {'a': (10, 20, 'RGB'), 'b': (10, 20, 'GRAY')})

Explanation: Jobs are independent, so the second job cannot read `b` from the first job's output.

Input: ({'x': (4, 4, 'RGB'), 'y': (8, 2, 'RGB')}, [{'input_path': 'x', 'output_path': 'z', 'operations': [('scale', 2)]}, {'input_path': 'y', 'output_path': 'z', 'operations': [('grayscale',)]}], 2)

Expected Output: ([('ok', 'z', (8, 8, 'RGB')), ('ok', 'z', (8, 2, 'GRAY'))], {'x': (4, 4, 'RGB'), 'y': (8, 2, 'RGB'), 'z': (8, 2, 'GRAY')})

Explanation: Both jobs succeed, but the later one wins for `z` when outputs are committed in input order.

Input: ({'p': (1, 1, 'RGB'), 'q': (2, 3, 'RGB')}, [{'input_path': 'p', 'output_path': 'bad', 'operations': [('scale', 0.4)]}, {'input_path': 'q', 'output_path': 'good', 'operations': [('resize', 5, 1), ('grayscale',)]}], 3)

Expected Output: ([('error', 'invalid_dimensions'), ('ok', 'good', (5, 1, 'GRAY'))], {'p': (1, 1, 'RGB'), 'q': (2, 3, 'RGB'), 'good': (5, 1, 'GRAY')})

Explanation: One job fails, but the other still succeeds.

Input: ({'img': (2, 2, 'GRAY')}, [], 4)

Expected Output: ([], {'img': (2, 2, 'GRAY')})

Explanation: Empty batch edge case.

Hints

Carry each job's original index so you can rebuild results in input order even if workers finish out of order.
Avoid shared mutable state across processes; let workers compute results independently, then merge successful writes in the parent process.