Design file deduplication across nested directories

Q: Design file deduplication across nested directories

This question evaluates a candidate's competency in file system traversal, robust I/O handling including symbolic links and cycles, content-based deduplication, and algorithmic efficiency within the Coding & Algorithms domain.

Q: How do I approach Coding & Algorithms interview questions?

Coding & Algorithms questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master coding & algorithms interviews.

Question

Design and implement a file deduplication tool that, given a root directory, identifies groups of duplicate files. Requirements: traverse nested directories safely; handle symbolic links and potential cycles; minimize I/O with a multi-stage strategy (e.g., compare file sizes, then partial hashes, then full hashes); support very large files; and output duplicate groups with file paths. Explain your data structures, step-by-step algorithm, time and space complexity, and key test cases.

Design file deduplication across nested directories

Overview

Comments (0)