
Write a program to deduplicate files in a very large directory tree. Identify groups of identical files without loading entire files into memory. Outline your approach to hashing (e.g., size filter, partial hash, full hash), chunking for very large files, and handling hash collisions. Support a mode that replaces duplicates with hard links (when safe) and a dry-run report of duplicate sets. Explain time and space complexity, how you batch disk I/O, and how you would parallelize across CPU cores or machines.