PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates understanding of persistent/versioned data structures, iterator semantics, concurrency control, and complexity-aware design for mutable sets, and is categorized under Coding & Algorithms.

  • Medium
  • Databricks
  • Coding & Algorithms
  • Software Engineer

Implement a snapshotable set with iterators

Company: Databricks

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Onsite

Implement a SnapshotSet data structure with the following API: add(x), remove(x), contains(x), snapshot() -> sid, and iterate(sid) -> iterator over the elements as of snapshot sid. Support many concurrent snapshots while allowing the live set to change. Optimize for O(log n) updates and O( 1) iterator next on average. Discuss design choices (copy-on-write, persistent balanced tree, or versioned hash buckets), memory usage, and snapshot reclamation. Provide unit tests for empty set, deletions before/after snapshots, and concurrent mutation during iteration.

Quick Answer: This question evaluates understanding of persistent/versioned data structures, iterator semantics, concurrency control, and complexity-aware design for mutable sets, and is categorized under Coding & Algorithms.

Implement a snapshotable integer set that supports many coexisting snapshots while the live set keeps changing. The live set supports these conceptual operations: - add(x): insert x into the live set - remove(x): remove x from the live set if present - contains(x): check whether x is in the live set - snapshot(): save the current live set and return a snapshot id sid - iterate(sid): return an iterator over the elements that existed in snapshot sid For an online judge, you will process a sequence of operations instead of exposing a class directly. The iterator API is modeled with two operations: - ('iterator', sid): create an iterator for snapshot sid and return an iterator id iid - ('next', iid): return the next element from that iterator in increasing order, or None if exhausted Important rules: - Snapshot ids start at 0 and increase by 1. - Iterator ids start at 0 and increase by 1. - Adding an existing value does nothing. - Removing a missing value does nothing. - contains(x) always checks the current live set, not a snapshot. - Iterators must be isolated from later mutations to the live set. Return a list containing the results of every operation that produces output, in order: - contains -> bool - snapshot -> int - iterator -> int - next -> int or None Interview discussion note: a full design discussion would compare copy-on-write (simple but expensive updates), versioned hash buckets (good lookups but awkward ordered iteration), and persistent balanced trees. For this coding task, aim for the persistent balanced-tree style solution. Snapshot reclamation can be handled by dropping unused snapshot roots/iterators so unreachable nodes are garbage-collected.

Constraints

  • 1 <= len(operations) <= 200000
  • Values are integers in the range [-10^9, 10^9]
  • Snapshot ids and iterator ids used in the input are valid
  • Target complexity: add/remove/contains in O(log n) expected, snapshot in O(1), iterator creation in O(log n), and next in O(1) amortized

Examples

Input: ([('snapshot',), ('iterator', 0), ('next', 0), ('contains', 5)],)

Expected Output: [0, 0, None, False]

Explanation: Snapshot 0 is an empty set. Its iterator is immediately exhausted, and 5 is not in the live set.

Input: ([('add', 1), ('add', 2), ('snapshot',), ('remove', 1), ('snapshot',), ('iterator', 0), ('next', 0), ('next', 0), ('next', 0), ('iterator', 1), ('next', 1), ('next', 1), ('contains', 1)],)

Expected Output: [0, 1, 0, 1, 2, None, 1, 2, None, False]

Explanation: Snapshot 0 contains {1,2}. After removing 1, snapshot 1 contains {2}. The old snapshot still iterates 1,2, while the live set no longer contains 1.

Input: ([('add', 3), ('add', 1), ('snapshot',), ('iterator', 0), ('next', 0), ('add', 2), ('remove', 3), ('next', 0), ('next', 0), ('snapshot',), ('iterator', 1), ('next', 1), ('next', 1), ('next', 1)],)

Expected Output: [0, 0, 1, 3, None, 1, 1, 1, 2, None]

Explanation: Iterator 0 is created from snapshot 0 = {1,3}. Even after the live set changes to {1,2}, iterator 0 still yields 3. Snapshot 1 reflects the later live set.

Input: ([('add', -1), ('add', -1), ('remove', 5), ('contains', -1), ('snapshot',), ('remove', -1), ('contains', -1), ('iterator', 0), ('next', 0), ('next', 0)],)

Expected Output: [True, 0, False, 0, -1, None]

Explanation: Duplicate add has no effect, removing a missing value has no effect, and the snapshot preserves -1 even after it is removed from the live set.

Solution

def solution(operations):
    import random

    rng = random.Random(0)

    class Node:
        __slots__ = ('key', 'prio', 'left', 'right')
        def __init__(self, key, prio, left=None, right=None):
            self.key = key
            self.prio = prio
            self.left = left
            self.right = right

    def contains(root, x):
        while root is not None:
            if x < root.key:
                root = root.left
            elif x > root.key:
                root = root.right
            else:
                return True
        return False

    def merge(a, b):
        if a is None:
            return b
        if b is None:
            return a
        if a.prio < b.prio:
            return Node(a.key, a.prio, a.left, merge(a.right, b))
        else:
            return Node(b.key, b.prio, merge(a, b.left), b.right)

    def split(root, key):
        # returns (< key, >= key)
        if root is None:
            return None, None
        if key <= root.key:
            l, r_left = split(root.left, key)
            return l, Node(root.key, root.prio, r_left, root.right)
        else:
            l_right, r = split(root.right, key)
            return Node(root.key, root.prio, root.left, l_right), r

    def add(root, x):
        if contains(root, x):
            return root
        left, right = split(root, x)
        node = Node(x, rng.getrandbits(64))
        return merge(merge(left, node), right)

    def remove(root, x):
        # Remove x if present (safe even if absent)
        left, mid_right = split(root, x)
        mid, right = split(mid_right, x + 1)
        return merge(left, right)

    def push_left(node, stack):
        while node is not None:
            stack.append(node)
            node = node.left

    live_root = None
    snapshots = []  # list of roots
    iterators = []  # list of stacks for in-order traversal

    out = []
    for op in operations:
        cmd = op[0]
        if cmd == 'add':
            x = op[1]
            live_root = add(live_root, x)
        elif cmd == 'remove':
            x = op[1]
            live_root = remove(live_root, x)
        elif cmd == 'contains':
            x = op[1]
            out.append(contains(live_root, x))
        elif cmd == 'snapshot':
            snapshots.append(live_root)
            out.append(len(snapshots) - 1)
        elif cmd == 'iterator':
            sid = op[1]
            root = snapshots[sid] if 0 <= sid < len(snapshots) else None
            stack = []
            push_left(root, stack)
            iterators.append(stack)
            out.append(len(iterators) - 1)
        elif cmd == 'next':
            iid = op[1]
            if 0 <= iid < len(iterators):
                stack = iterators[iid]
                if stack:
                    node = stack.pop()
                    push_left(node.right, stack)
                    out.append(node.key)
                else:
                    out.append(None)
            else:
                out.append(None)
        else:
            # Unknown command: ignore or raise; per problem, not expected
            pass

    return out

Time complexity: add/remove/contains: O(log n) expected; snapshot: O(1); iterator creation: O(log n); next: O(1) amortized. Space complexity: O(log n) new nodes per add/remove operation, O(log n) per active iterator, plus O(number of snapshots) root references.

Hints

  1. A snapshot can be just a pointer/reference to an immutable version of the set instead of a full copy.
  2. If the set is stored in a persistent balanced BST, an iterator can keep a stack of the current in-order path and get amortized O(1) next() calls.
Last updated: Apr 23, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Implement a Snapshot Set Iterator - Databricks (medium)
  • Find the Best Commute Mode - Databricks (medium)
  • Partition a Target String by Source Substrings - Databricks (medium)
  • Design In-Memory QPS Counter - Databricks (medium)
  • Implement Random Connectivity and Grid Routing - Databricks