Find duplicate files and apply image operations
Company: Anthropic
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: hard
Interview Round: Technical Screen
Quick Answer: This question evaluates skills in string parsing and content-based grouping for duplicate file detection, along with matrix manipulation for image operations like horizontal flips and box blur, covering competencies in algorithm design, data-structure selection, and numerical array transformations.
Part 1: Find Duplicate Files by Content
Constraints
- `1 <= len(paths) <= 2 * 10^4`
- Each record contains a directory followed by one or more file descriptions.
- File names and file contents contain no spaces.
- Aim for near-linear processing in the total size of the input.
Examples
Input: (['root/a 1.txt(abcd) 2.txt(efgh)', 'root/c 3.txt(abcd)', 'root/c/d 4.txt(efgh)', 'root 4.txt(efgh)'],)
Expected Output: [['root/4.txt', 'root/a/2.txt', 'root/c/d/4.txt'], ['root/a/1.txt', 'root/c/3.txt']]
Explanation: There are two duplicated contents: `efgh` and `abcd`.
Input: (['root/a 1.txt(abcd)'],)
Expected Output: []
Explanation: Edge case: only one file, so there are no duplicates.
Input: (['dir a.txt(x) b.txt(x) c.txt(y)', 'dir2 d.txt(y)'],)
Expected Output: [['dir/a.txt', 'dir/b.txt'], ['dir/c.txt', 'dir2/d.txt']]
Explanation: Two different contents each produce one duplicate group.
Input: (['root a.txt() b.txt()', 'other c.txt(z)'],)
Expected Output: [['root/a.txt', 'root/b.txt']]
Explanation: Edge case: empty content is still valid content, so the two files are duplicates.
Solution
def solution(paths):
from collections import defaultdict
content_to_paths = defaultdict(list)
for record in paths:
parts = record.split()
if not parts:
continue
directory = parts[0]
for file_info in parts[1:]:
left = file_info.find('(')
name = file_info[:left]
content = file_info[left + 1:-1]
full_path = directory + '/' + name
content_to_paths[content].append(full_path)
result = []
for group in content_to_paths.values():
if len(group) >= 2:
group.sort()
result.append(group)
result.sort()
return result
Time complexity: O(S + N log N), where S is the total input size and N is the total number of file paths that end up in duplicate groups. Building the map is near-linear; sorting is for deterministic output.. Space complexity: O(S), for storing parsed file paths grouped by content..
Hints
- Use a hash map from file content to a list of full paths.
- For each file token, split at the first `(` to get the file name, and remove the final `)` to get the content.
Part 2: Apply Flip and Blur Operations to a Grayscale Image
Constraints
- `1 <= H, W <= 2000`
- `0 <= len(operations) <= 100`
- Each pixel value is an integer.
- Each operation is either `FLIP` or `BLUR`.
Examples
Input: ([[1, 2, 3], [4, 5, 6]], ['FLIP'])
Expected Output: [[3, 2, 1], [6, 5, 4]]
Explanation: Each row is reversed.
Input: ([[1, 2, 3], [4, 5, 6], [7, 8, 9]], ['BLUR'])
Expected Output: [[3, 3, 4], [4, 5, 5], [6, 6, 7]]
Explanation: Each cell becomes the floor of the average of itself and all valid neighbors.
Input: ([[10, 20, 30], [40, 50, 60]], ['FLIP', 'BLUR'])
Expected Output: [[40, 35, 30], [40, 35, 30]]
Explanation: First reverse each row, then blur the transformed image.
Input: ([[7]], ['BLUR', 'FLIP', 'BLUR'])
Expected Output: [[7]]
Explanation: Edge case: a 1x1 image is unchanged by both operations.
Input: ([[0, 255], [128, 64]], [])
Expected Output: [[0, 255], [128, 64]]
Explanation: Edge case: no operations means the image stays the same.
Solution
def solution(img, operations):
if not img:
return []
mat = [row[:] for row in img]
def blur(matrix):
h = len(matrix)
w = len(matrix[0])
prefix = [[0] * (w + 1) for _ in range(h + 1)]
for i in range(h):
running = 0
for j in range(w):
running += matrix[i][j]
prefix[i + 1][j + 1] = prefix[i][j + 1] + running
out = [[0] * w for _ in range(h)]
for i in range(h):
r1 = max(0, i - 1)
r2 = min(h - 1, i + 1)
for j in range(w):
c1 = max(0, j - 1)
c2 = min(w - 1, j + 1)
total = (
prefix[r2 + 1][c2 + 1]
- prefix[r1][c2 + 1]
- prefix[r2 + 1][c1]
+ prefix[r1][c1]
)
count = (r2 - r1 + 1) * (c2 - c1 + 1)
out[i][j] = total // count
return out
for op in operations:
if op == 'FLIP':
for row in mat:
row.reverse()
elif op == 'BLUR':
mat = blur(mat)
return mat
Time complexity: O(K * H * W), where K is the number of operations. Each `FLIP` is O(H * W), and each `BLUR` is also O(H * W) using prefix sums.. Space complexity: O(H * W), mainly for the blurred image and the prefix-sum table during a blur step..
Hints
- A horizontal flip can be done in-place by reversing each row.
- For `BLUR`, a 2D prefix-sum table lets you compute each 3x3 neighborhood sum in O(1), making each blur pass O(H * W).