PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches

Quick Overview

This question evaluates file parsing, data deduplication and grouping skills along with algorithmic efficiency and space-time trade-offs when identifying duplicate files by size rather than by content.

  • Medium
  • Applied Intuition
  • Coding & Algorithms
  • Software Engineer

Find duplicate files by size

Company: Applied Intuition

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

##### Question LeetCode 609. Find Duplicate File in System — find duplicate files using file size instead of content comparison https://leetcode.com/problems/find-duplicate-file-in-system/description/

Quick Answer: This question evaluates file parsing, data deduplication and grouping skills along with algorithmic efficiency and space-time trade-offs when identifying duplicate files by size rather than by content.

You are given a list of directory descriptions similar to LeetCode 609, but instead of grouping files by identical content, you must group them by identical file size. Each string in the input has the form: "directory file1(size1) file2(size2) ... filek(sizek)" For every file, build its full path as "directory/filename". Two files are considered duplicates if their sizes are equal. Return all groups of duplicate files. Only include sizes that appear at least twice. To make the output deterministic: 1. File paths inside each group must appear in the same order they are encountered while scanning the input from left to right. 2. Groups must appear in the order that their size first appears in the input.

Constraints

  • 0 <= len(paths) <= 20000
  • Each directory description has length between 1 and 2000
  • The total number of files across all strings is at most 100000
  • 0 <= size <= 10^9
  • Directory names and file names contain no spaces, and file names do not contain parentheses

Examples

Input: (["root/a 1.txt(100) 2.txt(200) 3.txt(100)", "root/c 4.txt(300)", "root/c/d 4.txt(200)", "root 4.txt(300)"],)

Expected Output: [["root/a/1.txt", "root/a/3.txt"], ["root/a/2.txt", "root/c/d/4.txt"], ["root/c/4.txt", "root/4.txt"]]

Explanation: Size 100 appears in root/a/1.txt and root/a/3.txt, size 200 appears in root/a/2.txt and root/c/d/4.txt, and size 300 appears in root/c/4.txt and root/4.txt.

Input: (["home 1.txt(10) 2.txt(20)", "var 3.log(30)"],)

Expected Output: []

Explanation: Every file size is unique, so there are no duplicate groups.

Input: ([],)

Expected Output: []

Explanation: An empty input has no files and therefore no duplicates.

Input: (["data 0.bin(0)", "backup 1.bin(0) 2.bin(5)", "tmp 3.bin(5)"],)

Expected Output: [["data/0.bin", "backup/1.bin"], ["backup/2.bin", "tmp/3.bin"]]

Explanation: Files of size 0 form one duplicate group, and files of size 5 form another.

Input: (["docs a.txt(7) b.txt(7) c.txt(8)"],)

Expected Output: [["docs/a.txt", "docs/b.txt"]]

Explanation: Within the same directory, a.txt and b.txt share size 7, while c.txt is unique.

Solution

def solution(paths):
    size_to_paths = {}

    for entry in paths:
        parts = entry.split()
        if not parts:
            continue

        directory = parts[0]
        for file_info in parts[1:]:
            left = file_info.rfind('(')
            right = file_info.rfind(')')
            name = file_info[:left]
            size = int(file_info[left + 1:right])
            full_path = directory + '/' + name

            if size not in size_to_paths:
                size_to_paths[size] = []
            size_to_paths[size].append(full_path)

    return [group for group in size_to_paths.values() if len(group) > 1]

Time complexity: O(T), where T is the total number of characters across all input strings. Space complexity: O(F), where F is the total number of files.

Hints

  1. Use a hash map where the key is the file size and the value is the list of full paths with that size.
  2. You do not need to compare every pair of files. Parse each file once, build its full path, and append it to the correct group.
Last updated: May 9, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Design a nested transaction store - Applied Intuition (Medium)
  • Design a coupon pricing engine - Applied Intuition (Medium)
  • Implement transactional key–value store - Applied Intuition (Medium)
  • Find grid cell minimizing sum distances - Applied Intuition (Medium)
  • Design a transactional in-memory key–value store - Applied Intuition (Medium)