How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a hard difficulty Coding & Algorithms question, commonly asked during Technical Screen rounds at Cursor.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Cursor during technical interviews.

Implement a Merkle-tree repo diff | Cursor Coding Question

Implement a Merkle-tree repo diff

Company: Cursor

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: hard

Interview Round: Technical Screen

You are given two snapshots of a source-code repository on disk, each represented by a root directory path. Implement a Merkle tree over the **directory structure** (folders are internal nodes, files are leaves), and use it to compute which files changed between the two snapshots. Use the following API (Python-like): ```python from enum import Enum, auto from typing import Self class ChangeType(Enum): ADDED = auto() MODIFIED = auto() REMOVED = auto() class MerkleTree: def __init__(self, root_directory: str): """Build the Merkle tree from the filesystem under root_directory.""" ... def diff(self, other: Self) -> list[tuple[str, ChangeType]]: """Return a list of (relative_path, ChangeType) for all files that differ.""" ... ``` Requirements / clarifications: - The tree structure must follow the **actual folder hierarchy** in the repo (not a binary tree). - A file node’s hash should depend on its **contents** (and optionally its name); a directory node’s hash should depend on its children (names + hashes) in a deterministic order. - `diff()` should report **files only** as: - `ADDED`: path exists only in `other`. - `REMOVED`: path exists only in `self`. - `MODIFIED`: path exists in both but file contents differ. - Paths in the output should be **relative to the root** (e.g., `"src/main.py"`). - If a path changes type (file ↔ directory), you may treat it as removed + added for the affected file paths. - Use only the standard library to read directories/files. Design and implement the data structures and logic so it can run against real repositories on disk.

Quick Answer: This question evaluates a candidate's understanding of Merkle trees, filesystem-aware hashing, directory traversal, and diff algorithms for detecting file-level changes.

Implement a repository diff using a Merkle tree that mirrors the real directory hierarchy. Files are leaves, and directories are internal nodes with any number of children. A file node's hash must depend on its contents. A directory node's hash must depend on its children's names, types, and hashes in a deterministic sorted order. If two directory hashes are equal, you should be able to skip that entire subtree during diffing. Original interview API: - MerkleTree(root_directory) - diff(other) -> list[(relative_path, ChangeType)] To keep the judge self-contained, implement `solution(snapshot_a, snapshot_b)` instead. Each snapshot is a list of `(relative_path, contents)` pairs describing the files in that repository snapshot. Paths use `/` as the separator, and directories are implied by the file paths. Your implementation should still follow the real on-disk logic from the interview question. The reference solution materializes the snapshots into temporary directories, builds an on-disk Merkle tree using only the standard library, and diffs the two trees. Return a lexicographically sorted list of `(relative_path, change_type)` for files only, where `change_type` is one of `'ADDED'`, `'MODIFIED'`, or `'REMOVED'`. - `ADDED`: path exists only in `snapshot_b` - `REMOVED`: path exists only in `snapshot_a` - `MODIFIED`: path exists in both snapshots but file contents differ If a path changes type between snapshots (file <-> directory), treat it as removed plus added for the affected file paths.

Constraints

0 <= len(snapshot_a), len(snapshot_b) <= 2000
The total size of all file contents across both snapshots is at most 2 * 10^6 bytes
Within a single snapshot, all relative paths are unique
Within a single snapshot, no file path is a prefix of another file path
Paths are relative, use '/' as the separator, and refer only to regular files and directories implied by those paths

Examples

Input: ([('README.md', 'repo'), ('src/main.py', 'main\n'), ('src/util.py', 'add\n')], [('README.md', 'repo'), ('src/main.py', 'main\n'), ('src/util.py', 'add\n')])

Expected Output: []

Explanation: Both snapshots contain the same files with the same contents, so no file differs.

Input: ([('README.md', 'repo'), ('src/main.py', 'v1\n'), ('src/utils/helpers.py', 'helper\n'), ('docs/old.md', 'old\n')], [('README.md', 'repo'), ('src/main.py', 'v2\n'), ('src/utils/helpers.py', 'helper\n'), ('src/utils/new.py', 'new\n'), ('tests/test_main.py', 'ok\n')])

Expected Output: [('docs/old.md', 'REMOVED'), ('src/main.py', 'MODIFIED'), ('src/utils/new.py', 'ADDED'), ('tests/test_main.py', 'ADDED')]

Explanation: One file was removed, one existing file changed contents, and two new files were added.

Input: ([('config', 'port=8080\n'), ('keep.txt', 'same')], [('config/app.yml', 'port: 8080\n'), ('keep.txt', 'same')])

Expected Output: [('config', 'REMOVED'), ('config/app.yml', 'ADDED')]

Explanation: The path `config` changed from a file to a directory containing a file. Treat that as removing the old file and adding the new file path.

Input: ([], [('README.md', ''), ('src/app.py', 'run\n')])

Expected Output: [('README.md', 'ADDED'), ('src/app.py', 'ADDED')]

Explanation: The first snapshot is empty, so every file in the second snapshot is added.

Input: ([('a/b/c.txt', '1'), ('a/d.txt', '2'), ('x.txt', 'same')], [('x.txt', 'same')])

Expected Output: [('a/b/c.txt', 'REMOVED'), ('a/d.txt', 'REMOVED')]

Explanation: The entire `a` subtree exists only in the first snapshot, so all files inside it are reported as removed.

Input: ([('bin/data.bin', b'\x00\x01'), ('same.txt', 'ok')], [('bin/data.bin', b'\x00\x02'), ('same.txt', 'ok')])

Expected Output: [('bin/data.bin', 'MODIFIED')]

Explanation: The binary file contents differ, so that file is modified while the other file is unchanged.

Hints

Hash each file from its contents, and hash each directory from its sorted children as tuples like (name, kind, child_hash). Deterministic ordering is essential.
When diffing, if two directory hashes are equal, you can skip the whole subtree. Only recurse when hashes differ or when a child exists on just one side.

Quick Overview

This question evaluates a candidate's understanding of Merkle trees, filesystem-aware hashing, directory traversal, and diff algorithms for detecting file-level changes.