This question evaluates algorithm design and systems engineering competencies including chunking strategies, hash-function selection and collision mitigation, large-file and streaming processing, parallelization and I/O optimization, complexity analysis, and selection of lookup/index data structures such as hash tables, Bloom filters, or LSM-based indexes. Common in the Coding & Algorithms domain, it examines trade-offs between scalability, performance, and correctness under resource constraints and tests both conceptual understanding and practical application to real-world system-level constraints.

Design an algorithm to deduplicate files in a storage system. Compare fixed-size versus content-defined chunking and explain how you would choose hash functions (e.g., cryptographic hashes versus rolling hashes). Describe collision detection/mitigation, handling of very large files that do not fit in memory, streaming ingestion, and opportunities for parallelization and I/O optimization. Analyze time and space complexity and discuss data structures for fast lookups (e.g., hash tables, Bloom filters, LSM-based indexes).