This question evaluates a candidate's competence in designing and implementing scalable file deduplication systems, covering algorithmic hashing strategies, efficient disk I/O and batching, parallelization across cores or machines, large-file chunking, and safe filesystem operations like hard-linking.

Write a program to deduplicate files in a very large directory tree. Identify groups of identical files without loading entire files into memory. Outline your approach to hashing (e.g., size filter, partial hash, full hash), chunking for very large files, and handling hash collisions. Support a mode that replaces duplicates with hard links (when safe) and a dry-run report of duplicate sets. Explain time and space complexity, how you batch disk I/O, and how you would parallelize across CPU cores or machines.