Design file deduplication algorithm

Q: Design file deduplication algorithm

This is a Coding & Algorithms interview question from Anthropic for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach Coding & Algorithms interview questions?

Coding & Algorithms questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master coding & algorithms interviews.

Question

Design an algorithm to deduplicate files in a storage system. Compare fixed-size versus content-defined chunking and explain how you would choose hash functions (e.g., cryptographic hashes versus rolling hashes). Describe collision detection/mitigation, handling of very large files that do not fit in memory, streaming ingestion, and opportunities for parallelization and I/O optimization. Analyze time and space complexity and discuss data structures for fast lookups (e.g., hash tables, Bloom filters, LSM-based indexes).

Design file deduplication algorithm

Comments (0)