PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Coding & Algorithms/Anthropic

Find and remove duplicate files

Last updated: Mar 29, 2026

Quick Overview

This question evaluates the ability to design scalable file deduplication algorithms, specifically testing knowledge of hashing strategies, collision handling, memory and I/O optimization, handling very large files, incremental/resumable operation, and complexity and trade-off analysis.

  • Medium
  • Anthropic
  • Coding & Algorithms
  • Software Engineer

Find and remove duplicate files

Company: Anthropic

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Onsite

Given a directory tree that may not fit in memory, detect and optionally remove duplicate files. Define the algorithm, including how you handle very large files, hashing strategy (e.g., size grouping, partial hash, full hash or chunked rolling hash), collision handling, memory and I/O optimization, and how you would make it incremental and resumable. Provide complexity analysis and discuss trade-offs.

Quick Answer: This question evaluates the ability to design scalable file deduplication algorithms, specifically testing knowledge of hashing strategies, collision handling, memory and I/O optimization, handling very large files, incremental/resumable operation, and complexity and trade-off analysis.

Related Interview Questions

  • Convert Samples into Event Intervals - Anthropic (medium)
  • Convert State Stream to Events - Anthropic (medium)
  • Build a concurrent web crawler - Anthropic (medium)
  • Implement a Parallel Image Processor - Anthropic (medium)
  • Implement a Batch Image Processor - Anthropic (medium)
Anthropic logo
Anthropic
Jul 26, 2025, 12:00 AM
Software Engineer
Onsite
Coding & Algorithms
16
0

Given a directory tree that may not fit in memory, detect and optionally remove duplicate files. Define the algorithm, including how you handle very large files, hashing strategy (e.g., size grouping, partial hash, full hash or chunked rolling hash), collision handling, memory and I/O optimization, and how you would make it incremental and resumable. Provide complexity analysis and discuss trade-offs.

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Coding & Algorithms•More Anthropic•More Software Engineer•Anthropic Software Engineer•Anthropic Coding & Algorithms•Software Engineer Coding & Algorithms
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.