Find and remove duplicate files

Q: Find and remove duplicate files

This question evaluates the ability to design scalable file deduplication algorithms, specifically testing knowledge of hashing strategies, collision handling, memory and I/O optimization, handling very large files, incremental/resumable operation, and complexity and trade-off analysis.

Q: How do I approach Coding & Algorithms interview questions?

Coding & Algorithms questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master coding & algorithms interviews.

Question

Given a directory tree that may not fit in memory, detect and optionally remove duplicate files. Define the algorithm, including how you handle very large files, hashing strategy (e.g., size grouping, partial hash, full hash or chunked rolling hash), collision handling, memory and I/O optimization, and how you would make it incremental and resumable. Provide complexity analysis and discuss trade-offs.

Find and remove duplicate files

Overview

Comments (0)