Detect duplicate files efficiently
Company: Anthropic
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Onsite
Quick Answer: This question evaluates understanding of scalable file-deduplication and system-design concepts, including content hashing, I/O minimization, hash-collision handling, memory constraints, parallelization, incremental updates, and cross-machine deduplication.
Constraints
- 1 <= len(operations) <= 2 * 10^5
- 0 <= len(content) <= 10^5 for a single ADD/MODIFY operation
- The sum of len(content) over all ADD/MODIFY operations is <= 10^6
- 1 <= len(path) <= 200; ADD/MODIFY replace existing content at that path, and DELETE on a missing path has no effect
Examples
Input: ([('ADD', '/a.txt', 'hello'), ('ADD', '/b.txt', 'world'), ('ADD', '/c.txt', 'hello'), ('QUERY',), ('MODIFY', '/b.txt', 'hello'), ('QUERY',)],)