## Problem
Design a data structure that simulates **Undo/Redo** for a simplified spreadsheet editor.
A user performs a sequence of operations on cells. Each operation changes the value of a single cell.
### Operations
- `SET cell value`: set `cell` (e.g., "A1") to `value` (string or number).
- `UNDO k`: undo the last `k` successful `SET` operations (or as many as possible).
- `REDO k`: redo the last `k` operations that were undone (or as many as possible).
### Rules
- `UNDO` and `REDO` affect only `SET` operations.
- After an `UNDO`, if a new `SET` is executed, the redo history is cleared.
- If `UNDO k` requests more steps than exist, undo all available steps.
- If `REDO k` requests more steps than exist, redo all available steps.
### Input/Output (one possible interface)
Given a list of commands, return the final mapping of `cell -> value`.
### Example
Commands:
1. `SET A1 10`
2. `SET A1 20`
3. `SET B2 5`
4. `UNDO 2` (undo: `B2=5`, then `A1=20`)
5. `REDO 1` (redo: `A1=20`)
6. `SET C1 7` (clears redo stack)
Final state: `A1=20`, `C1=7` (and `B2` is empty).
## Constraints
- Up to `N` commands (e.g., 10^5).
- Aim for efficient per-command time and memory.
Discuss the data structures and complexity; implement the core logic.
Quick Answer: This question evaluates a candidate's ability to design stateful data structures and manage operation history, focusing on correctness of undo/redo semantics, handling edge cases, and analyzing time and space complexity.
Solution
### Approach
This is a classic **command-history / undo–redo** problem. The cleanest model is the **two-stack pattern**: every reversible action records what is needed to *invert* it. When you undo, you pop from the undo stack, apply the inverse, and push the *forward* action onto the redo stack. Redo is the mirror image.
The one subtlety here is that a `SET` does not just *create* a cell value — it *overwrites* a previous value (which may be the "empty"/unset state). So to undo a `SET`, you must remember the cell's **previous value**, not just the cell name. That single insight makes the whole thing fall out:
- To **undo** `SET A1 = new`, restore `A1` to `old` (its value just before the SET).
- To **redo** that same operation, set `A1` back to `new`.
So each history entry is the triple **(cell, old_value, new_value)**. With that, undo and redo are symmetric and O(1) each.
### Data structures
- `cells: dict[str, value]` — the live spreadsheet (cell → current value). Absence from the dict = "empty cell".
- `undo_stack: list[(cell, old, new)]` — successful `SET`s available to be undone (LIFO).
- `redo_stack: list[(cell, old, new)]` — operations that were undone and can be re-applied (LIFO).
A "sentinel" is needed to represent *empty*. The reason to avoid `None` here is not that anything else compares equal to it — in Python nothing does except `None` itself. It's that `None`/`null` is itself a storable cell value, so a `None` sentinel can't distinguish "cell unset" from "cell explicitly set to `None`". More generally, "absent" must be representable distinctly from *any* value a cell could hold. A dedicated sentinel object (or simply "key absent from dict") guarantees that. Below I use the dict's `pop`/membership so that "empty" is represented by the **absence of the key**, which removes the ambiguity entirely.
### The four invariants (the rules, restated precisely)
1. `UNDO`/`REDO` only ever touch `SET`s — they never appear in the history themselves.
2. A new `SET` after any `UNDO` **clears the redo stack** (you've forked history; the old future is unreachable).
3. `UNDO k` / `REDO k` do **as many as possible** — `min(k, len(stack))` — never error on overflow.
4. `UNDO 0` / `REDO 0` are no-ops.
### Algorithm per command
```
SET cell value:
old = current value of cell (or "absent")
apply: cells[cell] = value
push (cell, old, value) onto undo_stack
clear redo_stack # rule 2
UNDO k:
repeat min(k, len(undo_stack)) times:
(cell, old, new) = undo_stack.pop()
restore cell to `old` # absent => delete the key
push (cell, old, new) onto redo_stack
REDO k:
repeat min(k, len(redo_stack)) times:
(cell, old, new) = redo_stack.pop()
set cell to `new`
push (cell, old, new) onto undo_stack
```
Note redo does **not** clear anything and undo does **not** clear anything — only a fresh `SET` clears redo. That ordering matters: applying the value and then clearing redo (not before) keeps the logic clean.
### Reference implementation (Python)
```python
class Spreadsheet:
"""Undo/Redo for single-cell SET operations.
A history entry is (cell, old, new). 'Empty' is represented by the
sentinel _ABSENT so we can distinguish "cell unset" from "cell set to None".
"""
_ABSENT = object() # unique sentinel for "cell has no value"
def __init__(self):
self.cells = {} # cell -> value
self.undo_stack = [] # list[(cell, old, new)]
self.redo_stack = [] # list[(cell, old, new)]
def _get(self, cell):
return self.cells.get(cell, self._ABSENT)
def _restore(self, cell, value):
if value is self._ABSENT:
self.cells.pop(cell, None) # back to empty
else:
self.cells[cell] = value
def set(self, cell, value):
old = self._get(cell)
self.cells[cell] = value
self.undo_stack.append((cell, old, value))
self.redo_stack.clear() # rule 2: new SET forks history
def undo(self, k):
for _ in range(min(k, len(self.undo_stack))):
cell, old, new = self.undo_stack.pop()
self._restore(cell, old)
self.redo_stack.append((cell, old, new))
def redo(self, k):
for _ in range(min(k, len(self.redo_stack))):
cell, old, new = self.redo_stack.pop()
self.cells[cell] = new
self.undo_stack.append((cell, old, new))
def state(self):
return dict(self.cells)
def run(commands):
"""commands: list of tuples like ('SET','A1','10'), ('UNDO','2'), ('REDO','1')."""
s = Spreadsheet()
for cmd in commands:
op = cmd[0].upper()
if op == "SET":
s.set(cmd[1], cmd[2])
elif op == "UNDO":
s.undo(int(cmd[1]))
elif op == "REDO":
s.redo(int(cmd[1]))
else:
raise ValueError(f"Unknown op: {op}")
return s.state()
```
### Walking the example
```
SET A1 10 -> cells={A1:10} undo=[(A1,∅,10)] redo=[]
SET A1 20 -> cells={A1:20} undo=[(A1,∅,10),(A1,10,20)] redo=[]
SET B2 5 -> cells={A1:20,B2:5} undo=[...,(B2,∅,5)] redo=[]
UNDO 2 -> pop (B2,∅,5): B2->∅, redo=[(B2,∅,5)]
pop (A1,10,20): A1->10, redo=[(B2,∅,5),(A1,10,20)]
cells={A1:10} undo=[(A1,∅,10)]
REDO 1 -> pop (A1,10,20): A1->20, undo=[(A1,∅,10),(A1,10,20)]
cells={A1:20} redo=[(B2,∅,5)]
SET C1 7 -> cells={A1:20,C1:7} redo cleared
```
Final state: `A1=20`, `C1=7`, `B2` empty. ✔ Matches the expected output (`∅` denotes empty/absent).
### Complexity
Let $N$ be the number of commands and let $S$ be the **total number of undo/redo steps actually applied** across the whole run (i.e., the sum, over every `UNDO`/`REDO` command, of `min(k, len(stack))`).
- **Per `SET`:** O(1) amortized. The `redo_stack.clear()` is O(redo size), but each entry sitting in the redo stack was pushed there by an earlier `undo` step. Charge each clear to the `undo` step that created the entry: every entry is pushed once and removed (either by a `redo`, or by exactly one `clear`) at most once. So the *clear-work* is bounded by the *undo-work* and adds nothing asymptotically beyond it.
- **`UNDO k` / `REDO k`:** O(applied steps) = O(`min(k, len(stack))`). Each applied step changes a distinct cell value and must be applied individually, so this is tight.
- **Total time for $N$ commands:** $O(N + S)$. The $N$ term covers dispatch plus the O(1)-amortized `SET` work; the $S$ term covers all applied undo/redo steps (the charging argument above folds clear-work into it, since clear-work ≤ undo-work ≤ $S$).
**$S$ is not bounded by $N$.** A single `SET` can be undone and re-done arbitrarily many times, so the applied-step count is *not* limited by the number of `SET`s. Concretely, `SET x` repeated $m$ times, then $R$ repetitions of `[UNDO m, REDO m]`, uses $N = m + 2R$ commands but applies $S = 2Rm$ steps. With $m = R = t$ this is $N = 3t$ and $S = 2t^2 = \Theta(N^2)$. **The worst case is therefore quadratic, $\Theta(N + S)$ with $S$ up to $\Theta(N^2)$** — there is no general O(N) guarantee. It is linear only when the applied-step total is itself O(N) (e.g., bounded `k`, or each undone op redone O(1) times). This is intrinsic to the problem statement, not the data structure: any correct implementation must apply each requested step, because the requested mapping output depends on it.
- **Space:** $O(D)$ where $D$ is the maximum combined depth of the two stacks, which is O(number of `SET`s executed). Each entry stores a cell key + two values.
### Edge cases to call out in an interview
- **`UNDO`/`REDO` with k larger than history** → clamp via `min(k, len(stack))`; never throw.
- **`UNDO 0` / `REDO 0`** → no-op (the loop runs zero times).
- **Undoing a SET into a previously-empty cell** → must *delete* the key, not set it to `0`/`""`/`None`. This is the most common bug; the `_ABSENT` sentinel handles it.
- **Setting a cell to a value equal to its current value** (`SET A1 10` when A1 is already 10) → still a recorded operation; it's undoable. (If the spec wanted no-op-on-unchanged, you'd skip the push, but the problem doesn't ask for that.)
- **Interleaved `SET` after partial `UNDO`** → redo stack cleared, as exercised by the example.
- **Values are strings or numbers** → store as-is; no parsing needed for correctness of undo/redo. If the editor must compare/aggregate, normalize types at the boundary, not in the history.
### Design discussion / extensions (what a strong candidate volunteers)
- **Why two stacks vs. a single timeline + pointer?** An equivalent design keeps one `history` list plus a `cursor` index; `UNDO`/`REDO` move the cursor, and a new `SET` truncates everything past the cursor before appending. Both are O(1) per op and O(S) space; the two-stack version is slightly easier to reason about because the "future" is physically separated. The cursor version saves the cost of moving entries between stacks but pays a truncation cost on each new `SET`. (Note: neither variant changes the worst-case time analysis above — large-`k` `UNDO`/`REDO` still apply each step individually.)
- **Bounded history (memory cap).** For 10^5+ commands with large values, cap the undo stack at a fixed depth `D` using a deque/ring buffer; once full, drop the oldest entry (those become permanently un-undoable). This bounds memory to O(D) at the cost of limited undo depth — the standard trade-off real editors make.
- **Coalescing / transactions.** Real spreadsheets group rapid edits (e.g., typing) or multi-cell operations (paste, fill-down) into a single undo unit. Generalize a history entry from a single `(cell, old, new)` to a **list** of such triples representing one atomic command; `undo`/`redo` then iterate the group. The two-stack mechanics are unchanged.
- **Memento vs. command pattern.** Storing `(old, new)` is the **command** (delta) pattern — cheap because each op touches one cell. Snapshotting the entire grid per step (memento) is simpler to implement but O(grid size) per op in both time and memory — unacceptable at this scale. The delta approach is the right call here.
- **Concurrency note.** If multiple users edit the same sheet, a linear undo/redo stack is no longer sufficient — you'd move toward operational transforms or CRDTs so that undo respects causality and doesn't clobber a concurrent edit. Worth naming as out-of-scope for the single-user version asked here.