Find duplicate files and apply image operations

Q: Find duplicate files and apply image operations

This is a Coding & Algorithms interview question from Anthropic for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach Coding & Algorithms interview questions?

Coding & Algorithms questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master coding & algorithms interviews.

Question

Part A — Find duplicate files by content

You are given a list of directory records. Each record is a string describing a directory path followed by one or more files in that directory, where each file is described as name(content).

Example record:

"root/a 1.txt(abcd) 2.txt(efgh)"

Task

Return all groups of duplicate files, where two files are duplicates if they have exactly the same content. Each group should contain the full paths of all files that share that content (only include groups with at least 2 files).

Input

paths : an array of strings, each formatted as:
- dir file1(content1) file2(content2) ...

Output

A list of groups (each group is a list of strings), where each string is a full file path like "root/a/1.txt" .
Order of groups and order within a group do not matter.

Constraints (reasonable interview defaults)

1 <= paths.length <= 2*10^4
Total number of files across all records can be large; aim for near-linear time in total input size.

Part B — Image processing operations (flip & blur)

You are given a grayscale image represented as a 2D matrix img of integers (e.g., 0..255).

Task

Implement the following operations:

Horizontal flip : reverse each row.
Box blur with radius 1: each pixel becomes the average of itself and all valid neighbors in the 3×3 window centered at that pixel (use only in-bounds pixels). Use integer division/floor for the average.

Input

img : H x W integer matrix
An operation sequence (e.g., ["FLIP", "BLUR"] ) indicating the order to apply operations.

Output

The resulting image matrix after applying all operations in order.

Constraints (reasonable interview defaults)

1 <= H, W <= 2000
Discuss time and space tradeoffs; avoid unnecessary extra full-size copies when possible.

Find duplicate files and apply image operations

Part A — Find duplicate files by content

Task

Input

Output

Constraints (reasonable interview defaults)

Part B — Image processing operations (flip & blur)

Task

Input

Output

Constraints (reasonable interview defaults)

Comments (0)