Design transactional in-memory key-value store
Company: Snowflake
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: hard
Interview Round: Technical Screen
## Problem
Design and implement an **in-memory key–value store** that supports basic operations plus **transactions**.
### Core API
Implement the following operations:
- `get(key) -> value | null`
Return the current value for `key`, or `null`/`None` if it does not exist.
- `put(key, value)`
Set `key` to `value`.
- `delete(key)`
Remove `key` if it exists.
### Follow-up 1: Transactions (must support nested)
Add transactional operations:
- `begin()` — starts a new transaction scope (transactions may be nested).
- `commit() -> bool` — commits the **current** transaction.
- Returns `false` (or throws) if there is no active transaction.
- `rollback() -> bool` — rolls back the **current** transaction.
- Returns `false` (or throws) if there is no active transaction.
After `commit`, changes in the committed transaction become visible in the parent transaction (or globally if committing the outermost transaction). After `rollback`, all changes made since the last `begin()` are undone.
**Complexity requirement:** Each operation should run in **O(1)** time on average (constant-time hash operations allowed). You may assume keys and values fit in memory.
### Follow-up 2: Concurrency
Extend your design to support **multi-threaded** access:
- Multiple threads may call `get/put/delete/begin/commit/rollback` concurrently.
- Describe the thread-safety guarantees you provide (e.g., linearizability vs. weaker consistency) and what synchronization approach you would use.
### Notes
- Clarify how `delete` interacts with transactions (e.g., deleting a key inside a transaction should be reversible by rollback).
- Be prepared to discuss edge cases such as committing/rolling back with no active transaction and nested transactions.
Quick Answer: This question evaluates understanding of in-memory key–value store semantics, transactional and nested transaction behavior, and practical competencies in hash-based data structures and thread-safety for concurrent access.
Part 1: Basic in-memory key-value store
Implement a simple **in-memory key-value store** that processes a sequence of operations and returns the results of all reads.
## Task
Implement the function:
```python
def solution(operations):
...
```
The store starts **empty**. Process each operation in `operations` **in order**, then return a list containing the result of every `get` operation, in the order those `get` operations occur.
## Input
`operations` is a list of tuples. Each tuple is one of the following commands:
- `('put', key, value)` — store `value` under `key`. If `key` already exists, **overwrite** its value (the latest write wins).
- `('get', key)` — read the value currently stored under `key`. If `key` is **not present**, the result is `None`.
- `('delete', key)` — remove `key` from the store. If `key` is **not present**, do nothing (no error).
Here `key` is a string and `value` is an integer.
## Output
Return a **list** containing one entry per `get` operation, in the order the `get` operations were processed:
- For a `get` on a key that is currently in the store, append its current value.
- For a `get` on a missing key, append `None`.
Only `get` operations produce output. `put` and `delete` operations modify the store but contribute nothing to the returned list.
## Examples
- `[('put', 'a', 1), ('get', 'a'), ('put', 'a', 2), ('get', 'a'), ('delete', 'a'), ('get', 'a')]` → `[1, 2, None]`
(read `a`, then overwrite and re-read, then delete and read the now-missing key)
- `[('get', 'missing'), ('delete', 'missing'), ('get', 'missing')]` → `[None, None]`
(reading and deleting an absent key are both safe)
- `[]` → `[]`
(no operations, no output)
- `[('put', 'x', 10), ('put', 'y', 20), ('delete', 'x'), ('get', 'x'), ('get', 'y')]` → `[None, 20]`
## Constraints
- `0 <= len(operations) <= 200000`
- Keys are strings of length **1 to 100**.
- Values are integers and are never `None`.
- All operations are valid.
Constraints
- 0 <= len(operations) <= 200000
- Keys are strings of length 1 to 100
- Values are integers and are never None
- All operations are valid
Examples
Input: [('put', 'a', 1), ('get', 'a'), ('put', 'a', 2), ('get', 'a'), ('delete', 'a'), ('get', 'a')]
Expected Output: [1, 2, None]
Explanation: The value for 'a' is updated from 1 to 2, then deleted.
Input: [('get', 'missing'), ('delete', 'missing'), ('get', 'missing')]
Expected Output: [None, None]
Explanation: Getting a missing key returns None, and deleting a missing key is a no-op.
Input: []
Expected Output: []
Explanation: Edge case: no operations means no outputs.
Input: [('put', 'x', 10), ('put', 'y', 20), ('delete', 'x'), ('get', 'x'), ('get', 'y')]
Expected Output: [None, 20]
Explanation: After deleting 'x', only 'y' remains.
Solution
def solution(operations):
store = {}
result = []
for op in operations:
cmd = op[0]
if cmd == 'put':
_, key, value = op
store[key] = value
elif cmd == 'get':
_, key = op
result.append(store.get(key, None))
elif cmd == 'delete':
_, key = op
store.pop(key, None)
return resultExplanation
**Approach.** The store is a Python `dict` (`store`), which gives average **O(1)** insert, lookup, and delete via hashing. We make a single pass over `operations`, dispatching on the command tag `op[0]`, and collect outputs in `result` so the order matches the order of `get` calls.
**Per-operation logic:**
- `put`: unpack `_, key, value` and assign `store[key] = value` — this both inserts a new key and overwrites an existing one, so the *latest* value always wins.
- `get`: unpack `_, key` and append `store.get(key, None)`. Using `.get` with a default of `None` means a missing key yields `None` instead of raising `KeyError`, satisfying the spec for absent keys.
- `delete`: `store.pop(key, None)`. The `None` default is the crucial detail — `pop` removes the key if present and silently does nothing when the key is missing, exactly matching "deleting a missing key should do nothing."
**Why it's correct.** A dict is a faithful model of a mutable key-value store: each `put` reflects the most recent write, each `get` reads that current state, and each `delete` clears it. Because we record results only on `get`, in iteration order, the returned list is precisely the sequence of `get` outcomes the problem asks for. Edge cases — empty input, getting/deleting a never-seen key, and re-putting the same key — all fall out of the dict semantics without special handling, as the verified tests confirm (`[]`, repeated `get` on missing keys, overwrite-then-delete).
Time complexity: O(n) for n operations — each put/get/delete is O(1) average on the hash map (amortized, ignoring the cost of hashing a bounded-length key).. Space complexity: O(u + g) where u is the number of distinct live keys held in the dict and g is the number of get operations (the size of the returned result list)..
Hints
- A Python dict supports average O(1) get, set, and delete.
- Only 'get' operations add something to the output list.
Part 2: Transactional key-value store with nested transactions
Implement a single-threaded, in-memory **key–value store that supports nested transactions**. The store starts empty.
## What to implement
Write `solution(operations)`, where `operations` is a list of commands to apply **in order**. Each command is a sequence (tuple or list) whose **first element is the command name** (a string) and whose remaining elements are its arguments.
Apply every command to the store, and **return a list** containing the results of the `get`, `commit`, and `rollback` commands **in the order they were executed**. The `put`, `delete`, and `begin` commands produce **no entry** in the result list.
## Commands
| Command | Form | Effect | Appends to result? |
|---------|------|--------|--------------------|
| **put** | `('put', key, value)` | Set `key` to `value` (overwriting any existing value). | No |
| **delete** | `('delete', key)` | Remove `key` from the store if present; otherwise do nothing. | No |
| **get** | `('get', key)` | Look up `key`. | Yes — appends the current value, or **`None`** if `key` is not present. |
| **begin** | `('begin',)` | Open a new transaction. Transactions may be **nested**. | No |
| **commit** | `('commit',)` | Commit the **innermost** open transaction. | Yes — appends **`True`** if a transaction was open, otherwise **`False`**. |
| **rollback** | `('rollback',)` | Roll back the **innermost** open transaction. | Yes — appends **`True`** if a transaction was open, otherwise **`False`**. |
## Transaction semantics
- **Immediate visibility:** A `put` or `delete` is visible to every later operation right away, including later `get`s inside the same transaction.
- **`rollback`** undoes only the changes made **since its matching `begin`** (the innermost open transaction), restoring every key it touched to the value it had at that `begin`. Keys created during the transaction are removed; keys deleted or overwritten are restored to their begin-time state. Operations outside the transaction are unaffected. Rolling back removes that transaction; the next-outer transaction (if any) becomes the innermost open one.
- **`commit`** closes the innermost open transaction and **merges its changes into the enclosing transaction** if one exists. The changes are *not* yet made permanent in that case: a later `rollback` of the enclosing transaction will still undo them (restoring to the enclosing transaction's begin-time state). If there is no enclosing transaction, the committed changes become the persistent state of the store.
- **No active transaction:** If `commit` or `rollback` runs while no transaction is open, it makes no change and appends **`False`**.
## Input / output format
- **Input:** `operations` — a list of command sequences as described above. Command names are lowercase strings; `key` is a string; `value` is an integer.
- **Output:** a list whose elements are, in order, the results of the `get` / `commit` / `rollback` commands: each `get` contributes its looked-up value or `None`, and each `commit` / `rollback` contributes a boolean. An empty `operations` list yields an empty result list.
## Constraints
- `0 <= len(operations) <= 200000`
- Maximum nested transaction depth is at most `len(operations)`
- Keys are strings of length 1 to 100
- Values are integers and are never `None`
- All operations are valid
## Example
```
operations = [
('put', 'a', 1),
('begin',),
('put', 'a', 2),
('get', 'a'), # -> 2 (transaction write is visible immediately)
('begin',),
('delete', 'a'),
('get', 'a'), # -> None
('rollback',), # -> True (undo inner transaction)
('get', 'a'), # -> 2 (back to value set in outer transaction)
('commit',), # -> True (commit outer transaction)
('get', 'a'), # -> 2
]
# returns [2, None, True, 2, True, 2]
```
Constraints
- 0 <= len(operations) <= 200000
- Maximum nested transaction depth is at most len(operations)
- Keys are strings of length 1 to 100
- Values are integers and are never None
- All operations are valid
Examples
Input: [('put', 'a', 1), ('begin',), ('put', 'a', 2), ('get', 'a'), ('begin',), ('delete', 'a'), ('get', 'a'), ('rollback',), ('get', 'a'), ('commit',), ('get', 'a')]
Expected Output: [2, None, True, 2, True, 2]
Input: [('commit',), ('rollback',), ('get', 'x')]
Expected Output: [False, False, None]
Input: [('put', 'k', 5), ('begin',), ('delete', 'k'), ('get', 'k'), ('rollback',), ('get', 'k')]
Expected Output: [None, True, 5]
Input: [('put', 'x', 1), ('begin',), ('put', 'x', 2), ('begin',), ('put', 'x', 3), ('commit',), ('get', 'x'), ('rollback',), ('get', 'x')]
Expected Output: [True, 3, True, 1]
Input: []
Expected Output: []
Solution
def solution(operations):
store = {}
tx_stack = []
result = []
for op in operations:
cmd = op[0]
if cmd == 'begin':
tx_stack.append({})
elif cmd == 'put':
_, key, value = op
if tx_stack:
log = tx_stack[-1]
if key not in log:
log[key] = (key in store, store.get(key))
store[key] = value
else:
store[key] = value
elif cmd == 'delete':
_, key = op
if tx_stack:
log = tx_stack[-1]
if key not in log:
log[key] = (key in store, store.get(key))
store.pop(key, None)
else:
store.pop(key, None)
elif cmd == 'get':
_, key = op
result.append(store.get(key, None))
elif cmd == 'commit':
if not tx_stack:
result.append(False)
else:
log = tx_stack.pop()
if tx_stack:
parent = tx_stack[-1]
for key, old_state in log.items():
if key not in parent:
parent[key] = old_state
result.append(True)
elif cmd == 'rollback':
if not tx_stack:
result.append(False)
else:
log = tx_stack.pop()
for key, (had_key, old_value) in log.items():
if had_key:
store[key] = old_value
else:
store.pop(key, None)
result.append(True)
return resultExplanation
**Approach — one live dict plus a stack of undo-logs.**
The store keeps a single dictionary `store` that always holds the *current visible* state, so `get` is just `store.get(key)` and writes are immediately visible. Transaction isolation is achieved not by copying the store, but by recording, per open transaction, exactly enough information to *undo* its writes.
**State**
- `store`: the live key → value map.
- `tx_stack`: a stack of undo-logs; each `begin` pushes a fresh `{}`.
- Each log maps a touched `key` → `(had_key, old_value)`, the snapshot of that key *before* the transaction first changed it.
**put / delete** mutate `store` directly. If a transaction is open, before changing a key the code saves its prior state into the top log — but only on the *first* touch (`if key not in log`), so the snapshot reflects the value at `begin`, not an intermediate one.
**commit** pops the top log. If a parent transaction exists, it merges each entry into the parent — but only when the parent hasn't already snapshotted that key (`if key not in parent`). This preserves the parent's *earlier* snapshot, so a later parent rollback still restores the value as of the parent's `begin`. No active transaction → returns `False`.
**rollback** pops the top log and replays each `(had_key, old_value)`: restore the value if the key existed, else delete it. No active transaction → `False`.
**Why correct:** the first-touch guard guarantees each log holds the begin-time value; the commit merge propagates undo responsibility upward without clobbering older snapshots; so each `begin`/rollback pair undoes exactly its own changes, and nesting composes correctly.
Time complexity: O(n) total over n operations — get/put/delete/begin are O(1) amortized (dict ops); commit and rollback are O(k) in the number of distinct keys recorded in the popped transaction log, and each key is logged/merged a bounded number of times across the run.. Space complexity: O(u + p), where u is the number of keys currently in the visible store and p is the total number of key snapshots held across all active transaction logs on the stack..
Hints
- When a transaction first changes a key, record what that key looked like before the change.
- On nested commit, the current visible state should stay as-is; only the undo information needs to be merged upward.
Part 3: Concurrent transactional key-value store
Implement `solution(operations)` — a **concurrent transactional key-value store** that processes a list of operations under linearizable global-lock semantics.
The `operations` list is already in the **exact atomic order** in which the operations take effect, so you simply process them one by one in that order. Each operation is a tuple whose first element is the command name. Return a list of results (see **Output** below).
## Data model
- There is one shared **global store** holding the latest committed key→value pairs.
- Each **thread** (identified by `tid`) has its own independent stack of **nested transaction layers**.
- A thread can see its own uncommitted changes, but other threads cannot see them until the thread's **outermost** transaction commits.
- **Keys** are strings; **values** are integers (never `None`).
## Operations
Each `op` is a tuple. `tid` is a thread id.
- `('begin', tid)` — Start a new (possibly nested) transaction for that thread by pushing a fresh empty layer onto its stack.
- `('put', tid, key, value)` — Set `key` to `value`. If the thread has an active transaction, the write goes into the thread's **top** (innermost) layer only; otherwise it is applied directly to the global store.
- `('delete', tid, key)` — Remove `key`. If the thread has an active transaction, record the deletion in the thread's **top** layer (so the key reads as absent for that thread); otherwise remove it from the global store.
- `('get', tid, key)` — Read the value of `key` for that thread (see **Reads** below).
- `('commit', tid)` — Commit the thread's innermost active transaction.
- `('rollback', tid)` — Discard the thread's innermost active transaction.
## Reads
For `('get', tid, key)`, resolve the value as follows:
1. Scan that thread's transaction layers from **newest to oldest**. Return on the first layer that mentions `key`: if that layer recorded a `put`, return its value; if it recorded a `delete`, return `None`.
2. If no layer of the thread mentions `key`, fall back to the **current** value in the global store, or `None` if absent.
This is **not** snapshot isolation. A thread reads the latest committed global value at read time, so if another thread commits a new value for a key, future reads observe that new value — **unless** the reading thread has its own `put`/`delete` on that key in an active layer (in which case its own layer wins).
## Commit semantics
`('commit', tid)`:
- If the thread has **no active transaction**, do nothing and record `False`.
- Otherwise, pop the thread's **top** layer:
- If a **parent** layer still remains, merge the popped layer's changes into the parent layer (the changes stay uncommitted, just merged one level out toward the parent). Overlapping keys overwrite the parent's pending value.
- If the popped layer was the **outermost** one, apply its changes to the global store: each recorded `put` writes its value, and each recorded `delete` removes the key from the global store.
- Record `True`.
## Rollback semantics
`('rollback', tid)`:
- If the thread has **no active transaction**, record `False`.
- Otherwise, discard the thread's **top** layer (throwing away all of its pending changes) and record `True`.
## Output
Return a list built by appending, in operation order, exactly one result for each `get`, `commit`, and `rollback`:
- `get` → the resolved value (an integer) or `None`.
- `commit` → `True` if a transaction was committed, otherwise `False`.
- `rollback` → `True` if a transaction was discarded, otherwise `False`.
`begin`, `put`, and `delete` produce **no** entry in the output list.
## Example
Input:
```
[('put', 1, 'x', 1), ('begin', 2), ('get', 2, 'x'), ('put', 2, 'x', 5),
('get', 2, 'x'), ('get', 1, 'x'), ('commit', 2), ('get', 1, 'x')]
```
Output: `[1, 5, 1, True, 5]`
- Thread 1 puts `x = 1` directly into the global store (no output).
- Thread 2 begins a transaction, then reads `x` → `1` (from the global store).
- Thread 2 puts `x = 5` in its own layer; its next read → `5`, but thread 1 still reads the global `1`.
- Thread 2 commits (outermost) → `True`, writing `x = 5` to the global store; thread 1 now reads `5`.
## Constraints
- `0 <= len(operations) <= 200000`
- `1 <= number of distinct thread IDs <= 100000`
- Maximum nested transaction depth per thread is at most `len(operations)`.
- Keys are strings of length 1 to 100.
- Values are integers and are never `None`.
- All operations are valid and should be processed exactly in the given order.
Constraints
- 0 <= len(operations) <= 200000
- 1 <= number of distinct thread IDs <= 100000
- Maximum nested transaction depth per thread is at most len(operations)
- Keys are strings of length 1 to 100
- Values are integers and are never None
- All operations are valid and should be processed exactly in the given order
Examples
Input: [('put', 1, 'x', 1), ('begin', 2), ('get', 2, 'x'), ('put', 2, 'x', 5), ('get', 2, 'x'), ('get', 1, 'x'), ('commit', 2), ('get', 1, 'x')]
Expected Output: [1, 5, 1, True, 5]
Explanation: Thread 2 sees its own uncommitted write, while thread 1 does not see it until thread 2 commits.
Input: [('begin', 1), ('put', 1, 'a', 10), ('begin', 1), ('delete', 1, 'a'), ('get', 1, 'a'), ('get', 2, 'a'), ('rollback', 1), ('get', 1, 'a'), ('commit', 1), ('get', 2, 'a')]
Expected Output: [None, None, True, 10, True, 10]
Explanation: The inner delete is visible only to thread 1 and is undone by rollback. After the outer commit, thread 2 can see the value.
Input: [('commit', 3), ('rollback', 3), ('put', 1, 'k', 7), ('delete', 2, 'k'), ('get', 1, 'k')]
Expected Output: [False, False, None]
Explanation: Edge case: thread 3 has no active transaction. Thread 2 deletes 'k' globally because it is not inside a transaction.
Input: [('begin', 1), ('put', 1, 'x', 2), ('begin', 1), ('put', 1, 'x', 3), ('commit', 1), ('get', 1, 'x'), ('get', 2, 'x'), ('rollback', 1), ('get', 1, 'x')]
Expected Output: [True, 3, None, True, None]
Explanation: The child commit merges into thread 1's parent transaction, but thread 2 still cannot see the value until the outermost commit.
Input: [('put', 1, 'v', 1), ('begin', 2), ('get', 2, 'v'), ('delete', 1, 'v'), ('get', 2, 'v')]
Expected Output: [1, None]
Explanation: This shows the model is not snapshot isolation: thread 2 begins before the delete, but a later read still sees the latest global committed state.
Solution
def solution(operations):
TOMBSTONE = object()
global_store = {}
stacks = {}
result = []
def read_value(tid, key):
layers = stacks.get(tid, [])
for layer in reversed(layers):
if key in layer:
return None if layer[key] is TOMBSTONE else layer[key]
return global_store.get(key, None)
for op in operations:
cmd = op[0]
if cmd == 'begin':
_, tid = op
stacks.setdefault(tid, []).append({})
elif cmd == 'put':
_, tid, key, value = op
layers = stacks.get(tid, [])
if layers:
layers[-1][key] = value
else:
global_store[key] = value
elif cmd == 'delete':
_, tid, key = op
layers = stacks.get(tid, [])
if layers:
layers[-1][key] = TOMBSTONE
else:
global_store.pop(key, None)
elif cmd == 'get':
_, tid, key = op
result.append(read_value(tid, key))
elif cmd == 'commit':
_, tid = op
layers = stacks.get(tid, [])
if not layers:
result.append(False)
else:
current = layers.pop()
if layers:
layers[-1].update(current)
else:
for key, value in current.items():
if value is TOMBSTONE:
global_store.pop(key, None)
else:
global_store[key] = value
if not layers:
stacks.pop(tid, None)
result.append(True)
elif cmd == 'rollback':
_, tid = op
layers = stacks.get(tid, [])
if not layers:
result.append(False)
else:
layers.pop()
if not layers:
stacks.pop(tid, None)
result.append(True)
return resultExplanation
The store keeps three pieces of state: `global_store` (the latest committed values), `stacks` (a per-thread list of nested transaction layers, each a dict), and a `TOMBSTONE` sentinel that marks a key deleted *inside* a layer (distinct from "key absent").
**Writes.** `put`/`delete` look at the thread's stack. With no active transaction they mutate `global_store` directly. Inside a transaction they write only to the **top** layer — `put` stores the value, `delete` stores `TOMBSTONE` — so other threads can't see the change yet.
**Reads.** `read_value` scans that thread's layers newest→oldest and returns on the first layer that contains the key (TOMBSTONE → `None`). If no layer shadows it, it falls back to `global_store.get(key)`. This is exactly the spec: own uncommitted changes first, then the *current* committed value — not a snapshot, so a later commit by another thread becomes visible unless this thread has shadowed the key.
**commit.** Pop the top layer. If a parent layer remains, merge the popped dict into it with `update` (the change is still uncommitted, just promoted one level). If it was the outermost layer, replay each entry into `global_store`, turning TOMBSTONEs into real deletes. Empty stacks are cleaned up. Returns `True`, or `False` when the thread had no transaction.
**rollback.** Discard the top layer and return `True` (`False` if none), throwing its changes away.
**Why correct:** because operations arrive in the exact atomic order they take effect, single-threaded sequential processing under one global lock faithfully reproduces the linearizable semantics. Newest-first layer scan gives nested visibility; `update`-on-commit gives the parent-promotion rule; outermost commit is the only point that touches shared state.
Time complexity: A `get` is O(d) for the thread's current transaction depth d; a `commit` is O(k) for the number of keys in the committed layer (dict update / global replay); `begin`, `put`, `delete`, and `rollback` are O(1) average. Total over m operations is O(m·d) worst case but effectively O(m) amortized in typical inputs.. Space complexity: O(u + p), where u is the number of committed keys in `global_store` and p is the total number of pending key updates across all active thread-local transaction layers..
Hints
- Keep one transaction stack per thread ID, plus one shared committed store.
- For a read, only the calling thread's own transaction layers matter before you fall back to the shared committed store.