Design a duplicate-file removal algorithm

Q: Design a duplicate-file removal algorithm

This is a Coding & Algorithms interview question from Abnormal Security for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach Coding & Algorithms interview questions?

Coding & Algorithms questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master coding & algorithms interviews.

Question

Your filesystem contains millions of photos. Duplicates are strictly byte-identical files (no ML/CV similarity). Design an algorithm to detect and delete duplicates efficiently on a single machine. Specify: how you compute and store per-file signatures (e.g., full hash vs size+partial+full, streaming I/O); how the in-memory key–value store maps signatures to canonical file paths; how you handle hash collisions and verification before deletion; how you treat files with identical names but different content, permissions, or timestamps; big-O time and space complexity and I/O considerations; and provide pseudocode for a function that returns the set of file paths safe to delete.

Design a duplicate-file removal algorithm

Comments (0)