How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a Medium difficulty Coding & Algorithms question, commonly asked during Technical Screen rounds at Applied Intuition.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Applied Intuition during technical interviews.

Find duplicate files by size | Applied Intuition Coding Question

Find duplicate files by size

Company: Applied Intuition

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

##### Question LeetCode 609. Find Duplicate File in System — find duplicate files using file size instead of content comparison https://leetcode.com/problems/find-duplicate-file-in-system/description/

Quick Answer: This question evaluates file parsing, data deduplication and grouping skills along with algorithmic efficiency and space-time trade-offs when identifying duplicate files by size rather than by content.

You are given a list of directory descriptions similar to LeetCode 609, but instead of grouping files by identical content, you must group them by identical file size. Each string in the input has the form: "directory file1(size1) file2(size2) ... filek(sizek)" For every file, build its full path as "directory/filename". Two files are considered duplicates if their sizes are equal. Return all groups of duplicate files. Only include sizes that appear at least twice. To make the output deterministic: 1. File paths inside each group must appear in the same order they are encountered while scanning the input from left to right. 2. Groups must appear in the order that their size first appears in the input.

Constraints

0 <= len(paths) <= 20000
Each directory description has length between 1 and 2000
The total number of files across all strings is at most 100000
0 <= size <= 10^9
Directory names and file names contain no spaces, and file names do not contain parentheses

Examples

Input: (["root/a 1.txt(100) 2.txt(200) 3.txt(100)", "root/c 4.txt(300)", "root/c/d 4.txt(200)", "root 4.txt(300)"],)

Expected Output: [["root/a/1.txt", "root/a/3.txt"], ["root/a/2.txt", "root/c/d/4.txt"], ["root/c/4.txt", "root/4.txt"]]

Explanation: Size 100 appears in root/a/1.txt and root/a/3.txt, size 200 appears in root/a/2.txt and root/c/d/4.txt, and size 300 appears in root/c/4.txt and root/4.txt.

Input: (["home 1.txt(10) 2.txt(20)", "var 3.log(30)"],)

Expected Output: []

Explanation: Every file size is unique, so there are no duplicate groups.

Input: ([],)

Expected Output: []

Explanation: An empty input has no files and therefore no duplicates.

Input: (["data 0.bin(0)", "backup 1.bin(0) 2.bin(5)", "tmp 3.bin(5)"],)

Expected Output: [["data/0.bin", "backup/1.bin"], ["backup/2.bin", "tmp/3.bin"]]

Explanation: Files of size 0 form one duplicate group, and files of size 5 form another.

Input: (["docs a.txt(7) b.txt(7) c.txt(8)"],)

Expected Output: [["docs/a.txt", "docs/b.txt"]]

Explanation: Within the same directory, a.txt and b.txt share size 7, while c.txt is unique.

Solution

def solution(paths):
    size_to_paths = {}

    for entry in paths:
        parts = entry.split()
        if not parts:
            continue

        directory = parts[0]
        for file_info in parts[1:]:
            left = file_info.rfind('(')
            right = file_info.rfind(')')
            name = file_info[:left]
            size = int(file_info[left + 1:right])
            full_path = directory + '/' + name

            if size not in size_to_paths:
                size_to_paths[size] = []
            size_to_paths[size].append(full_path)

    return [group for group in size_to_paths.values() if len(group) > 1]

Time complexity: O(T), where T is the total number of characters across all input strings. Space complexity: O(F), where F is the total number of files.

Hints

Use a hash map where the key is the file size and the value is the list of full paths with that size.
You do not need to compare every pair of files. Parse each file once, build its full path, and append it to the correct group.

Quick Overview

This question evaluates file parsing, data deduplication and grouping skills along with algorithmic efficiency and space-time trade-offs when identifying duplicate files by size rather than by content.