PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Coding & Algorithms/Abnormal Security

Design a duplicate-file removal algorithm

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's proficiency with file deduplication algorithms, hashing and signature strategies, I/O-efficient streaming, in-memory indexing, collision detection, and handling file metadata such as permissions and timestamps.

  • Medium
  • Abnormal Security
  • Coding & Algorithms
  • Software Engineer

Design a duplicate-file removal algorithm

Company: Abnormal Security

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

Your filesystem contains millions of photos. Duplicates are strictly byte-identical files (no ML/CV similarity). Design an algorithm to detect and delete duplicates efficiently on a single machine. Specify: how you compute and store per-file signatures (e.g., full hash vs size+partial+full, streaming I/O); how the in-memory key–value store maps signatures to canonical file paths; how you handle hash collisions and verification before deletion; how you treat files with identical names but different content, permissions, or timestamps; big-O time and space complexity and I/O considerations; and provide pseudocode for a function that returns the set of file paths safe to delete.

Quick Answer: This question evaluates a candidate's proficiency with file deduplication algorithms, hashing and signature strategies, I/O-efficient streaming, in-memory indexing, collision detection, and handling file metadata such as permissions and timestamps.

Abnormal Security logo
Abnormal Security
Jul 16, 2025, 12:00 AM
Software Engineer
Technical Screen
Coding & Algorithms
8
0

Your filesystem contains millions of photos. Duplicates are strictly byte-identical files (no ML/CV similarity). Design an algorithm to detect and delete duplicates efficiently on a single machine. Specify: how you compute and store per-file signatures (e.g., full hash vs size+partial+full, streaming I/O); how the in-memory key–value store maps signatures to canonical file paths; how you handle hash collisions and verification before deletion; how you treat files with identical names but different content, permissions, or timestamps; big-O time and space complexity and I/O considerations; and provide pseudocode for a function that returns the set of file paths safe to delete.

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Coding & Algorithms•More Abnormal Security•More Software Engineer•Abnormal Security Software Engineer•Abnormal Security Coding & Algorithms•Software Engineer Coding & Algorithms
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.