PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches

Quick Overview

This question evaluates a candidate's skills in file-system traversal using DFS/backtracking, duplicate detection across hierarchical directories, and algorithmic techniques such as hashing and efficient data-structure use.

  • Medium
  • Applied Intuition
  • Coding & Algorithms
  • Software Engineer

Delete duplicate files via DFS

Company: Applied Intuition

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

##### Question LeetCode 609. Find Duplicate File in System – Implement an algorithm (using DFS/backtracking) to delete files with duplicate content in a file directory tree, and discuss how you would optimize the solution to run faster. https://leetcode.com/problems/find-duplicate-file-in-system/description/

Quick Answer: This question evaluates a candidate's skills in file-system traversal using DFS/backtracking, duplicate detection across hierarchical directories, and algorithmic techniques such as hashing and efficient data-structure use.

You are given a directory tree represented as nested dictionaries. Each directory node has keys 'name' (string), 'dirs' (list of directory nodes), and 'files' (list of file nodes). Each file node has keys 'name' (string) and 'content' (string). Two files are duplicates if their 'content' strings are equal. Build absolute paths starting at the root as '/{root_name}/.../{file_name}'. Return a list of absolute file paths that should be deleted so that only one file per unique content remains. Keep exactly one file per content: the file with the lexicographically smallest absolute path. Return the list of paths to delete sorted lexicographically.

Constraints

  • 1 <= total number of directories + files <= 20000
  • All 'name' fields are non-empty and do not contain '/'
  • File 'content' is a string; two files are duplicates if their 'content' strings are equal
  • Directory depth can be large; use iterative DFS to avoid recursion-depth issues
  • Return the deletion list sorted lexicographically
  • Absolute path format: '/' + root name + '/' + ... + '/' + file name

Solution

def delete_duplicate_files(tree: dict) -> list:
    """Return absolute paths of files to delete to keep one per content.
    The kept file per content is the one with the lexicographically smallest absolute path.
    Input tree format:
    {
      "name": str,
      "dirs": [dir_node, ...],
      "files": [{"name": str, "content": str}, ...]
    }
    """
    if not tree or "name" not in tree:
        return []

    stack = [(tree, "/" + tree["name"])]
    files = []  # list of (content, path)
    min_path = {}  # content -> smallest path

    while stack:
        node, cur = stack.pop()
        for f in node.get("files", []):
            path = cur + "/" + f["name"]
            content = f.get("content", "")
            files.append((content, path))
            if content not in min_path or path < min_path[content]:
                min_path[content] = path
        for d in node.get("dirs", []):
            stack.append((d, cur + "/" + d["name"]))

    deletions = [path for content, path in files if path != min_path[content]]
    deletions.sort()
    return deletions
Explanation
Perform an iterative DFS over the directory tree, carrying the absolute path for each directory on the stack. For every file encountered, build its absolute path and record the pair (content, path). Maintain a dictionary mapping each content to the lexicographically smallest path seen so far. After traversal, any file whose path is not the minimal path for its content must be deleted. Sort these paths lexicographically before returning them.

Time complexity: O(n log n), where n is the number of files (sorting the deletion list dominates).. Space complexity: O(n) for storing file paths and the content-to-min-path map..

Hints

  1. Traverse the directory tree with DFS using an explicit stack to build absolute paths.
  2. Use a hash map from content to the smallest (lexicographic) path seen so far.
  3. Collect all (content, path) pairs, then delete all paths that are not the minimal path for their content.
  4. Sort the resulting deletion list for deterministic output.
Last updated: Mar 29, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Design a nested transaction store - Applied Intuition (Medium)
  • Design a coupon pricing engine - Applied Intuition (Medium)
  • Implement transactional key–value store - Applied Intuition (Medium)
  • Find grid cell minimizing sum distances - Applied Intuition (Medium)
  • Design a transactional in-memory key–value store - Applied Intuition (Medium)