How do I approach Coding & Algorithms interview questions?

Coding & Algorithms questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master coding & algorithms interviews.

What difficulty level is this interview question?

This is a easy difficulty Coding & Algorithms question, commonly asked during Onsite rounds at Applied.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Applied during technical interviews.

Group duplicate files by content | Applied Interview Question

Q: Group duplicate files by content

This question evaluates understanding of file deduplication, efficient I/O and hashing strategies, streaming large-file processing, and scalable algorithm design for identifying files with identical byte-for-byte contents.

Applied

Feb 12, 2026, 12:00 AM

Software Engineer

Onsite

Coding & Algorithms

0

Problem

You are given a list of files in a filesystem. Each file has:

a full path (string)
a file size in bytes (integer)
file contents (conceptually; assume you can stream/read the file when needed)

Two files are duplicates if their contents are identical (byte-for-byte), regardless of file name or directory.

Task: Return groups of duplicate files. Each group should contain the full paths of files that have identical content, and groups of size 1 should be omitted.

Constraints / Expectations

There can be millions of files.
Reading file contents is expensive.
Aim to minimize unnecessary content reads / hashing.

Follow-ups

Describe an approach that avoids hashing every file if most files are unique.
How would you handle extremely large files (e.g., >GB) without loading them fully into memory?
If hash collisions are a concern, how do you confirm duplicates?

Example

Input files (path, size, content):

/a/x.txt, 4, "ABCD"
/b/y.txt, 4, "ABCD"
/c/z.txt, 4, "WXYZ"
/d/w.txt, 7, "1234567"

Output:

["/a/x.txt", "/b/y.txt"]

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Coding & Algorithms•More Applied•More Software Engineer•Applied Software Engineer•Applied Coding & Algorithms•Software Engineer Coding & Algorithms

Group duplicate files by content

Quick Overview