PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCareers

Quick Overview

This question evaluates skills in string parsing and content-based grouping for duplicate file detection, along with matrix manipulation for image operations like horizontal flips and box blur, covering competencies in algorithm design, data-structure selection, and numerical array transformations.

  • hard
  • Anthropic
  • Coding & Algorithms
  • Software Engineer

Find duplicate files and apply image operations

Company: Anthropic

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: hard

Interview Round: Technical Screen

## Part A — Find duplicate files by content You are given a list of directory records. Each record is a string describing a directory path followed by one or more files in that directory, where each file is described as `name(content)`. Example record: - `"root/a 1.txt(abcd) 2.txt(efgh)"` ### Task Return all groups of **duplicate files**, where two files are duplicates if they have **exactly the same content**. Each group should contain the **full paths** of all files that share that content (only include groups with at least 2 files). ### Input - `paths`: an array of strings, each formatted as: - `dir file1(content1) file2(content2) ...` ### Output - A list of groups (each group is a list of strings), where each string is a full file path like `"root/a/1.txt"`. - Order of groups and order within a group do not matter. ### Constraints (reasonable interview defaults) - `1 <= paths.length <= 2*10^4` - Total number of files across all records can be large; aim for near-linear time in total input size. --- ## Part B — Image processing operations (flip & blur) You are given a grayscale image represented as a 2D matrix `img` of integers (e.g., `0..255`). ### Task Implement the following operations: 1. **Horizontal flip**: reverse each row. 2. **Box blur** with radius 1: each pixel becomes the average of itself and all valid neighbors in the 3×3 window centered at that pixel (use only in-bounds pixels). Use integer division/floor for the average. ### Input - `img`: `H x W` integer matrix - An operation sequence (e.g., `["FLIP", "BLUR"]`) indicating the order to apply operations. ### Output - The resulting image matrix after applying all operations in order. ### Constraints (reasonable interview defaults) - `1 <= H, W <= 2000` - Discuss time and space tradeoffs; avoid unnecessary extra full-size copies when possible.

Quick Answer: This question evaluates skills in string parsing and content-based grouping for duplicate file detection, along with matrix manipulation for image operations like horizontal flips and box blur, covering competencies in algorithm design, data-structure selection, and numerical array transformations.

Part 1: Find Duplicate Files by Content

You are given a list of directory records. Each record starts with a directory path, followed by one or more file descriptions in the form `name(content)`. Two files are considered duplicates if their contents are exactly the same. Return every group of duplicate files as full file paths. For deterministic output in this problem, sort each group lexicographically, then sort the list of groups lexicographically.

Constraints

  • `1 <= len(paths) <= 2 * 10^4`
  • Each record contains a directory followed by one or more file descriptions.
  • File names and file contents contain no spaces.
  • Aim for near-linear processing in the total size of the input.

Examples

Input: (['root/a 1.txt(abcd) 2.txt(efgh)', 'root/c 3.txt(abcd)', 'root/c/d 4.txt(efgh)', 'root 4.txt(efgh)'],)

Expected Output: [['root/4.txt', 'root/a/2.txt', 'root/c/d/4.txt'], ['root/a/1.txt', 'root/c/3.txt']]

Explanation: There are two duplicated contents: `efgh` and `abcd`.

Input: (['root/a 1.txt(abcd)'],)

Expected Output: []

Explanation: Edge case: only one file, so there are no duplicates.

Input: (['dir a.txt(x) b.txt(x) c.txt(y)', 'dir2 d.txt(y)'],)

Expected Output: [['dir/a.txt', 'dir/b.txt'], ['dir/c.txt', 'dir2/d.txt']]

Explanation: Two different contents each produce one duplicate group.

Input: (['root a.txt() b.txt()', 'other c.txt(z)'],)

Expected Output: [['root/a.txt', 'root/b.txt']]

Explanation: Edge case: empty content is still valid content, so the two files are duplicates.

Solution

def solution(paths):
    from collections import defaultdict

    content_to_paths = defaultdict(list)

    for record in paths:
        parts = record.split()
        if not parts:
            continue
        directory = parts[0]

        for file_info in parts[1:]:
            left = file_info.find('(')
            name = file_info[:left]
            content = file_info[left + 1:-1]
            full_path = directory + '/' + name
            content_to_paths[content].append(full_path)

    result = []
    for group in content_to_paths.values():
        if len(group) >= 2:
            group.sort()
            result.append(group)

    result.sort()
    return result

Time complexity: O(S + N log N), where S is the total input size and N is the total number of file paths that end up in duplicate groups. Building the map is near-linear; sorting is for deterministic output.. Space complexity: O(S), for storing parsed file paths grouped by content..

Hints

  1. Use a hash map from file content to a list of full paths.
  2. For each file token, split at the first `(` to get the file name, and remove the final `)` to get the content.

Part 2: Apply Flip and Blur Operations to a Grayscale Image

You are given a grayscale image as a 2D integer matrix `img` and a sequence of operations. Support two operations: `FLIP`, which reverses every row horizontally, and `BLUR`, which replaces each pixel with the floor of the average of all valid cells in its 3x3 neighborhood centered at that pixel. Apply the operations in the given order and return the final image.

Constraints

  • `1 <= H, W <= 2000`
  • `0 <= len(operations) <= 100`
  • Each pixel value is an integer.
  • Each operation is either `FLIP` or `BLUR`.

Examples

Input: ([[1, 2, 3], [4, 5, 6]], ['FLIP'])

Expected Output: [[3, 2, 1], [6, 5, 4]]

Explanation: Each row is reversed.

Input: ([[1, 2, 3], [4, 5, 6], [7, 8, 9]], ['BLUR'])

Expected Output: [[3, 3, 4], [4, 5, 5], [6, 6, 7]]

Explanation: Each cell becomes the floor of the average of itself and all valid neighbors.

Input: ([[10, 20, 30], [40, 50, 60]], ['FLIP', 'BLUR'])

Expected Output: [[40, 35, 30], [40, 35, 30]]

Explanation: First reverse each row, then blur the transformed image.

Input: ([[7]], ['BLUR', 'FLIP', 'BLUR'])

Expected Output: [[7]]

Explanation: Edge case: a 1x1 image is unchanged by both operations.

Input: ([[0, 255], [128, 64]], [])

Expected Output: [[0, 255], [128, 64]]

Explanation: Edge case: no operations means the image stays the same.

Solution

def solution(img, operations):
    if not img:
        return []

    mat = [row[:] for row in img]

    def blur(matrix):
        h = len(matrix)
        w = len(matrix[0])

        prefix = [[0] * (w + 1) for _ in range(h + 1)]
        for i in range(h):
            running = 0
            for j in range(w):
                running += matrix[i][j]
                prefix[i + 1][j + 1] = prefix[i][j + 1] + running

        out = [[0] * w for _ in range(h)]
        for i in range(h):
            r1 = max(0, i - 1)
            r2 = min(h - 1, i + 1)
            for j in range(w):
                c1 = max(0, j - 1)
                c2 = min(w - 1, j + 1)
                total = (
                    prefix[r2 + 1][c2 + 1]
                    - prefix[r1][c2 + 1]
                    - prefix[r2 + 1][c1]
                    + prefix[r1][c1]
                )
                count = (r2 - r1 + 1) * (c2 - c1 + 1)
                out[i][j] = total // count

        return out

    for op in operations:
        if op == 'FLIP':
            for row in mat:
                row.reverse()
        elif op == 'BLUR':
            mat = blur(mat)

    return mat

Time complexity: O(K * H * W), where K is the number of operations. Each `FLIP` is O(H * W), and each `BLUR` is also O(H * W) using prefix sums.. Space complexity: O(H * W), mainly for the blurred image and the prefix-sum table during a blur step..

Hints

  1. A horizontal flip can be done in-place by reversing each row.
  2. For `BLUR`, a 2D prefix-sum table lets you compute each 3x3 neighborhood sum in O(1), making each blur pass O(H * W).
Last updated: May 4, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • Careers
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Convert Samples into Event Intervals - Anthropic (medium)
  • Convert State Stream to Events - Anthropic (medium)
  • Build a concurrent web crawler - Anthropic (medium)
  • Implement a Parallel Image Processor - Anthropic (medium)
  • Implement a Batch Image Processor - Anthropic (medium)