How do I approach Coding & Algorithms interview questions?

Coding & Algorithms questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master coding & algorithms interviews.

What difficulty level is this interview question?

This is a Medium difficulty Coding & Algorithms question, commonly asked during Onsite rounds at HubSpot.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at HubSpot during technical interviews.

Design file deduplication at scale

Last updated: Mar 29, 2026

Quick Overview

This question evaluates understanding of file-system algorithms, hashing and deduplication techniques, scalable I/O and memory management, and correctness considerations for hash collisions, and it belongs to the Coding & Algorithms domain.

HubSpot

Sep 6, 2025, 12:00 AM

Software Engineer

Onsite

Coding & Algorithms

Design an algorithm to identify duplicate files in a large directory tree. You are given an iterator over files providing (path, size) and a function read_chunks(path) -> Iterator[bytes]. Requirements: minimize I/O by comparing sizes and using rolling or cryptographic hashes; handle hash collisions safely; support datasets that do not fit in memory; and output groups of paths that are byte-for-byte identical. Explain time/space trade-offs and how you would parallelize the solution.

Submit Your Answer to Earn 20XP

Loading comments...

Browse More Questions

More Coding & Algorithms•More HubSpot•More Software Engineer•HubSpot Software Engineer•HubSpot Coding & Algorithms•Software Engineer Coding & Algorithms