This question evaluates understanding of file integrity verification, hashing strategies, collision risk mitigation, streaming I/O, and deduplication in distributed environments, testing a candidate's ability to reason about performance and correctness trade-offs.
You need a robust, scalable method to decide if two files (potentially very large and located across different directories or machines) are bitwise identical. The solution should minimize CPU, memory, I/O, and network costs while keeping collision risk negligible and supporting a deduplication workflow.
Describe a step-by-step approach that covers:
Login required