This question evaluates understanding of file deduplication, efficient I/O and hashing strategies, streaming large-file processing, and scalable algorithm design for identifying files with identical byte-for-byte contents.
You are given a list of files in a filesystem. Each file has:
Two files are duplicates if their contents are identical (byte-for-byte), regardless of file name or directory.
Task: Return groups of duplicate files. Each group should contain the full paths of files that have identical content, and groups of size 1 should be omitted.
Input files (path, size, content):
Output: