
Design an algorithm to deduplicate files in a storage system. Compare fixed-size versus content-defined chunking and explain how you would choose hash functions (e.g., cryptographic hashes versus rolling hashes). Describe collision detection/mitigation, handling of very large files that do not fit in memory, streaming ingestion, and opportunities for parallelization and I/O optimization. Analyze time and space complexity and discuss data structures for fast lookups (e.g., hash tables, Bloom filters, LSM-based indexes).