This question evaluates the ability to design scalable file deduplication algorithms, specifically testing knowledge of hashing strategies, collision handling, memory and I/O optimization, handling very large files, incremental/resumable operation, and complexity and trade-off analysis.

Given a directory tree that may not fit in memory, detect and optionally remove duplicate files. Define the algorithm, including how you handle very large files, hashing strategy (e.g., size grouping, partial hash, full hash or chunked rolling hash), collision handling, memory and I/O optimization, and how you would make it incremental and resumable. Provide complexity analysis and discuss trade-offs.