Implement a snapshotable set with iterators
Company: Databricks
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Onsite
Quick Answer: This question evaluates understanding of persistent/versioned data structures, iterator semantics, concurrency control, and complexity-aware design for mutable sets, and is categorized under Coding & Algorithms.
Constraints
- 1 <= len(operations) <= 200000
- Values are integers in the range [-10^9, 10^9]
- Snapshot ids and iterator ids used in the input are valid
- Target complexity: add/remove/contains in O(log n) expected, snapshot in O(1), iterator creation in O(log n), and next in O(1) amortized
Examples
Input: ([('snapshot',), ('iterator', 0), ('next', 0), ('contains', 5)],)
Expected Output: [0, 0, None, False]
Explanation: Snapshot 0 is an empty set. Its iterator is immediately exhausted, and 5 is not in the live set.
Input: ([('add', 1), ('add', 2), ('snapshot',), ('remove', 1), ('snapshot',), ('iterator', 0), ('next', 0), ('next', 0), ('next', 0), ('iterator', 1), ('next', 1), ('next', 1), ('contains', 1)],)
Expected Output: [0, 1, 0, 1, 2, None, 1, 2, None, False]
Explanation: Snapshot 0 contains {1,2}. After removing 1, snapshot 1 contains {2}. The old snapshot still iterates 1,2, while the live set no longer contains 1.
Input: ([('add', 3), ('add', 1), ('snapshot',), ('iterator', 0), ('next', 0), ('add', 2), ('remove', 3), ('next', 0), ('next', 0), ('snapshot',), ('iterator', 1), ('next', 1), ('next', 1), ('next', 1)],)
Expected Output: [0, 0, 1, 3, None, 1, 1, 1, 2, None]
Explanation: Iterator 0 is created from snapshot 0 = {1,3}. Even after the live set changes to {1,2}, iterator 0 still yields 3. Snapshot 1 reflects the later live set.
Input: ([('add', -1), ('add', -1), ('remove', 5), ('contains', -1), ('snapshot',), ('remove', -1), ('contains', -1), ('iterator', 0), ('next', 0), ('next', 0)],)
Expected Output: [True, 0, False, 0, -1, None]
Explanation: Duplicate add has no effect, removing a missing value has no effect, and the snapshot preserves -1 even after it is removed from the live set.
Solution
def solution(operations):
import random
rng = random.Random(0)
class Node:
__slots__ = ('key', 'prio', 'left', 'right')
def __init__(self, key, prio, left=None, right=None):
self.key = key
self.prio = prio
self.left = left
self.right = right
def contains(root, x):
while root is not None:
if x < root.key:
root = root.left
elif x > root.key:
root = root.right
else:
return True
return False
def merge(a, b):
if a is None:
return b
if b is None:
return a
if a.prio < b.prio:
return Node(a.key, a.prio, a.left, merge(a.right, b))
else:
return Node(b.key, b.prio, merge(a, b.left), b.right)
def split(root, key):
# returns (< key, >= key)
if root is None:
return None, None
if key <= root.key:
l, r_left = split(root.left, key)
return l, Node(root.key, root.prio, r_left, root.right)
else:
l_right, r = split(root.right, key)
return Node(root.key, root.prio, root.left, l_right), r
def add(root, x):
if contains(root, x):
return root
left, right = split(root, x)
node = Node(x, rng.getrandbits(64))
return merge(merge(left, node), right)
def remove(root, x):
# Remove x if present (safe even if absent)
left, mid_right = split(root, x)
mid, right = split(mid_right, x + 1)
return merge(left, right)
def push_left(node, stack):
while node is not None:
stack.append(node)
node = node.left
live_root = None
snapshots = [] # list of roots
iterators = [] # list of stacks for in-order traversal
out = []
for op in operations:
cmd = op[0]
if cmd == 'add':
x = op[1]
live_root = add(live_root, x)
elif cmd == 'remove':
x = op[1]
live_root = remove(live_root, x)
elif cmd == 'contains':
x = op[1]
out.append(contains(live_root, x))
elif cmd == 'snapshot':
snapshots.append(live_root)
out.append(len(snapshots) - 1)
elif cmd == 'iterator':
sid = op[1]
root = snapshots[sid] if 0 <= sid < len(snapshots) else None
stack = []
push_left(root, stack)
iterators.append(stack)
out.append(len(iterators) - 1)
elif cmd == 'next':
iid = op[1]
if 0 <= iid < len(iterators):
stack = iterators[iid]
if stack:
node = stack.pop()
push_left(node.right, stack)
out.append(node.key)
else:
out.append(None)
else:
out.append(None)
else:
# Unknown command: ignore or raise; per problem, not expected
pass
return out
Time complexity: add/remove/contains: O(log n) expected; snapshot: O(1); iterator creation: O(log n); next: O(1) amortized. Space complexity: O(log n) new nodes per add/remove operation, O(log n) per active iterator, plus O(number of snapshots) root references.
Hints
- A snapshot can be just a pointer/reference to an immutable version of the set instead of a full copy.
- If the set is stored in a persistent balanced BST, an iterator can keep a stack of the current in-order path and get amortized O(1) next() calls.