How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a Medium difficulty Coding & Algorithms question, commonly asked during Onsite rounds at Databricks.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Databricks during technical interviews.

Implement a snapshotable set with iterators | Databricks Coding Question

Implement a snapshotable set with iterators

Company: Databricks

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Onsite

Implement a SnapshotSet data structure with the following API: add(x), remove(x), contains(x), snapshot() -> sid, and iterate(sid) -> iterator over the elements as of snapshot sid. Support many concurrent snapshots while allowing the live set to change. Optimize for O(log n) updates and O( 1) iterator next on average. Discuss design choices (copy-on-write, persistent balanced tree, or versioned hash buckets), memory usage, and snapshot reclamation. Provide unit tests for empty set, deletions before/after snapshots, and concurrent mutation during iteration.

Quick Answer: This question evaluates understanding of persistent/versioned data structures, iterator semantics, concurrency control, and complexity-aware design for mutable sets, and is categorized under Coding & Algorithms.

Implement a **snapshotable integer set**: a mutable set of integers from which you can take read-only snapshots, then iterate over any snapshot's contents in sorted order — even while the live set keeps changing afterward. ## What to implement Write a function `solution(operations)` that replays a sequence of operations against a single live set (initially empty) and returns a list of results. `operations` is a **list of tuples**. The first element of each tuple is the command name (a string); the remaining element, when present, is the argument. ## Operations Some operations mutate the live set and produce **no output**: - `('add', x)` — insert integer `x` into the live set. If `x` is already present, do nothing. - `('remove', x)` — remove `x` from the live set if it is present. If `x` is absent, do nothing. The remaining operations each **append one value** to the result list: - `('contains', x)` — return `True` if `x` is in the **current live set**, else `False`. This always checks the live set, never a snapshot. - `('snapshot',)` — record the current contents of the live set and return the new **snapshot id** `sid`. Snapshot ids are assigned `0, 1, 2, …` in the order snapshots are taken. A snapshot may be taken of an empty live set. - `('iterator', sid)` — create a new in-order iterator over the elements that existed in snapshot `sid`, and return the new **iterator id** `iid`. Iterator ids are assigned `0, 1, 2, …` in the order iterators are created. - `('next', iid)` — advance iterator `iid` and return its next element in **strictly increasing order**, or `None` if the iterator has already yielded every element of its snapshot. ## Snapshot and iterator semantics - A **snapshot** captures the live set exactly as it was at the moment `('snapshot',)` ran. Later `add`/`remove` operations on the live set do **not** change any existing snapshot. - An **iterator** is bound to its snapshot and is fully **isolated from later mutations**: once created, it yields exactly the elements of that snapshot in increasing order, regardless of any `add`/`remove` calls (or new snapshots) that happen afterward. - Each iterator tracks its own position. Calling `('next', iid)` repeatedly walks that snapshot's elements from smallest to largest; after the last element, every further `next` on that iterator returns `None`. - Multiple iterators (over the same or different snapshots) can be active at once, each advancing independently. ## Output Return a list containing the result of every operation that produces output, in the order those operations appear: - `contains` → `bool` - `snapshot` → `int` (the snapshot id) - `iterator` → `int` (the iterator id) - `next` → `int`, or `None` when the iterator is exhausted ## Example Given `operations`: ``` ('add', 1), ('add', 2), ('snapshot',), ('remove', 1), ('snapshot',), ('iterator', 0), ('next', 0), ('next', 0), ('next', 0), ('iterator', 1), ('next', 1), ('next', 1), ('contains', 1) ``` the result is: ``` [0, 1, 0, 1, 2, None, 1, 2, None, False] ``` - After the two adds, snapshot 0 captures `{1, 2}` (returns `0`). - `remove(1)` leaves the live set as `{2}`; snapshot 1 captures `{2}` (returns `1`). - Iterator 0 (over snapshot 0) yields `1`, then `2`, then `None`. - Iterator 1 (over snapshot 1) yields `2`, then `None`. - `contains(1)` is `False` because `1` was removed from the live set. ## Constraints - `1 <= len(operations) <= 200000` - Values are integers in the range `[-10^9, 10^9]`. - Every snapshot id and iterator id used as an argument is valid (it refers to a snapshot/iterator that was already created). - **Target complexity:** `add` / `remove` / `contains` in `O(log n)` expected, `snapshot` in `O(1)`, iterator creation in `O(log n)`, and `next` in `O(1)` amortized. > **Interview discussion note:** a full design discussion would compare copy-on-write (simple but expensive updates), versioned hash buckets (good lookups but awkward ordered iteration), and persistent balanced trees. For this coding task, aim for the persistent balanced-tree style solution. Snapshot reclamation can be handled by dropping unused snapshot roots/iterators so unreachable nodes are garbage-collected.

Constraints

1 <= len(operations) <= 200000
Values are integers in the range [-10^9, 10^9]
Snapshot ids and iterator ids used in the input are valid
Target complexity: add/remove/contains in O(log n) expected, snapshot in O(1), iterator creation in O(log n), and next in O(1) amortized

Examples

Input: ([('snapshot',), ('iterator', 0), ('next', 0), ('contains', 5)],)

Expected Output: [0, 0, None, False]

Explanation: Snapshot 0 is an empty set. Its iterator is immediately exhausted, and 5 is not in the live set.

Input: ([('add', 1), ('add', 2), ('snapshot',), ('remove', 1), ('snapshot',), ('iterator', 0), ('next', 0), ('next', 0), ('next', 0), ('iterator', 1), ('next', 1), ('next', 1), ('contains', 1)],)

Expected Output: [0, 1, 0, 1, 2, None, 1, 2, None, False]

Explanation: Snapshot 0 contains {1,2}. After removing 1, snapshot 1 contains {2}. The old snapshot still iterates 1,2, while the live set no longer contains 1.

Input: ([('add', 3), ('add', 1), ('snapshot',), ('iterator', 0), ('next', 0), ('add', 2), ('remove', 3), ('next', 0), ('next', 0), ('snapshot',), ('iterator', 1), ('next', 1), ('next', 1), ('next', 1)],)

Expected Output: [0, 0, 1, 3, None, 1, 1, 1, 2, None]

Explanation: Iterator 0 is created from snapshot 0 = {1,3}. Even after the live set changes to {1,2}, iterator 0 still yields 3. Snapshot 1 reflects the later live set.

Input: ([('add', -1), ('add', -1), ('remove', 5), ('contains', -1), ('snapshot',), ('remove', -1), ('contains', -1), ('iterator', 0), ('next', 0), ('next', 0)],)

Expected Output: [True, 0, False, 0, -1, None]

Explanation: Duplicate add has no effect, removing a missing value has no effect, and the snapshot preserves -1 even after it is removed from the live set.

Solution

def solution(operations):
    import random

    rng = random.Random(0)

    class Node:
        __slots__ = ('key', 'prio', 'left', 'right')
        def __init__(self, key, prio, left=None, right=None):
            self.key = key
            self.prio = prio
            self.left = left
            self.right = right

    def contains(root, x):
        while root is not None:
            if x < root.key:
                root = root.left
            elif x > root.key:
                root = root.right
            else:
                return True
        return False

    def merge(a, b):
        if a is None:
            return b
        if b is None:
            return a
        if a.prio < b.prio:
            return Node(a.key, a.prio, a.left, merge(a.right, b))
        else:
            return Node(b.key, b.prio, merge(a, b.left), b.right)

    def split(root, key):
        # returns (< key, >= key)
        if root is None:
            return None, None
        if key <= root.key:
            l, r_left = split(root.left, key)
            return l, Node(root.key, root.prio, r_left, root.right)
        else:
            l_right, r = split(root.right, key)
            return Node(root.key, root.prio, root.left, l_right), r

    def add(root, x):
        if contains(root, x):
            return root
        left, right = split(root, x)
        node = Node(x, rng.getrandbits(64))
        return merge(merge(left, node), right)

    def remove(root, x):
        # Remove x if present (safe even if absent)
        left, mid_right = split(root, x)
        mid, right = split(mid_right, x + 1)
        return merge(left, right)

    def push_left(node, stack):
        while node is not None:
            stack.append(node)
            node = node.left

    live_root = None
    snapshots = []  # list of roots
    iterators = []  # list of stacks for in-order traversal

    out = []
    for op in operations:
        cmd = op[0]
        if cmd == 'add':
            x = op[1]
            live_root = add(live_root, x)
        elif cmd == 'remove':
            x = op[1]
            live_root = remove(live_root, x)
        elif cmd == 'contains':
            x = op[1]
            out.append(contains(live_root, x))
        elif cmd == 'snapshot':
            snapshots.append(live_root)
            out.append(len(snapshots) - 1)
        elif cmd == 'iterator':
            sid = op[1]
            root = snapshots[sid] if 0 <= sid < len(snapshots) else None
            stack = []
            push_left(root, stack)
            iterators.append(stack)
            out.append(len(iterators) - 1)
        elif cmd == 'next':
            iid = op[1]
            if 0 <= iid < len(iterators):
                stack = iterators[iid]
                if stack:
                    node = stack.pop()
                    push_left(node.right, stack)
                    out.append(node.key)
                else:
                    out.append(None)
            else:
                out.append(None)
        else:
            # Unknown command: ignore or raise; per problem, not expected
            pass

    return out

Explanation

The solution implements a **persistent treap** (a BST keyed on values, with a heap order on random 64-bit priorities) so that many snapshots can coexist while the live set keeps mutating. **Why persistence is free here.** The two core treap primitives, `split` and `merge`, never mutate existing nodes — every time they descend through a node they allocate a *new* `Node` and reuse the untouched subtree by reference (path copying). So after `add`/`remove` only the O(log n) nodes on the affected root-to-leaf path are fresh; the previous root still describes the old set perfectly. `snapshot()` therefore just appends the current `live_root` to a list — O(1) — and that saved root is permanently isolated from later changes. **Operations.** - `contains`: ordinary iterative BST descent on the live root. - `add(x)`: if absent, `split` the tree into `< x` and `>= x`, create a node with a random priority, and `merge(merge(left, node), right)`. `merge` keeps BST order while respecting heap priority, giving expected O(log n) height. - `remove(x)`: split out the single-key slice `[x, x+1)` and merge the two outer pieces, dropping `x`. - `iterator(sid)`: build an explicit stack holding the **leftmost spine** of that snapshot's root via `push_left`. Because the snapshot root is immutable, the iterator is naturally isolated. - `next(iid)`: pop the top (smallest unseen), then `push_left` its right child — classic stack-based in-order traversal yielding increasing order, O(1) amortized. Outputs from `contains`, `snapshot`, `iterator`, and `next` are appended to `out` in order. Correctness rests on the immutability of saved roots and the BST+heap invariants `split`/`merge` preserve.

Time complexity: add/remove: O(log n) expected; contains: O(log n) expected; snapshot: O(1); iterator creation: O(log n); next: O(1) amortized (each node pushed/popped once over a full traversal).. Space complexity: O(log n) new nodes allocated per add/remove (path copying), O(log n) per active iterator stack, plus O(number of snapshots) saved root references. Total live structure is O(n + total path-copied nodes across versions)..

Hints

A snapshot can be just a pointer/reference to an immutable version of the set instead of a full copy.
If the set is stored in a persistent balanced BST, an iterator can keep a stack of the current in-order path and get amortized O(1) next() calls.