Design an In-Memory Spreadsheet Engine
Company: Harvey
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: hard
Interview Round: Technical Screen
Quick Answer: This question tests a candidate's ability to design a dependency-tracking data structure with efficient cache invalidation, drawing on graph traversal and topological ordering concepts. It evaluates practical system design and algorithms skills, particularly managing directed dependency graphs, cycle detection, and O(1) read performance through proactive cache propagation.
Constraints
- All values are integers; no floating point.
- Formula operators: only '+' between terms; each term is an integer literal or a cell reference.
- A reference to a never-set cell evaluates to 0.
- get must be O(1) (return a cached value), not a re-evaluation of the formula tree.
- A set that would introduce a dependency cycle is rejected ('ERROR') and leaves the engine in its previous valid state.
- Reassigning a cell replaces its contents and drops its previous dependency edges.
Examples
Input: ([['set', 'A1', '10'], ['get', 'A1']],)
Expected Output: ['OK', 10]
Explanation: Literal set then read returns the stored value 10.
Input: ([['set', 'A1', '10'], ['set', 'A2', '20'], ['set', 'A3', '=A1+A2'], ['get', 'A3']],)
Expected Output: ['OK', 'OK', 'OK', 30]
Explanation: A3 sums its two references 10 + 20 = 30.
Input: ([['set', 'A1', '10'], ['set', 'A2', '20'], ['set', 'A3', '=A1+A2'], ['set', 'A1', '5'], ['get', 'A3']],)
Expected Output: ['OK', 'OK', 'OK', 'OK', 25]
Explanation: Updating A1 to 5 propagates to A3, which recomputes to 5 + 20 = 25.
Input: ([['set', 'A1', '1'], ['set', 'A2', '=A1+1'], ['set', 'A3', '=A2+1'], ['get', 'A3'], ['set', 'A1', '10'], ['get', 'A3']],)
Expected Output: ['OK', 'OK', 'OK', 3, 'OK', 12]
Explanation: Chained references: A3 = A2 + 1 = (A1 + 1) + 1 = 3, then A1 -> 10 propagates through A2 to A3 = (10 + 1) + 1 = 12.
Input: ([['set', 'A1', '7'], ['set', 'A3', '=A1+A1'], ['get', 'A3']],)
Expected Output: ['OK', 'OK', 14]
Explanation: A repeated reference counts twice: 7 + 7 = 14.
Input: ([['set', 'A1', '10'], ['set', 'A2', '20'], ['set', 'A3', '=A1+A2'], ['set', 'A3', '20'], ['set', 'A1', '99'], ['get', 'A3']],)
Expected Output: ['OK', 'OK', 'OK', 'OK', 'OK', 20]
Explanation: Reassigning A3 to the literal 20 drops its old dependencies, so the later A1 -> 99 does not affect A3.
Input: ([['set', 'A1', '=A1'], ['get', 'A1']],)
Expected Output: ['ERROR', 0]
Explanation: A direct self-reference is a cycle: the set is rejected and A1 keeps its prior value (never set -> 0).
Input: ([['set', 'A1', '5'], ['set', 'A2', '=A1'], ['set', 'A1', '=A2'], ['get', 'A1'], ['get', 'A2']],)
Expected Output: ['OK', 'OK', 'ERROR', 5, 5]
Explanation: A2 depends on A1; setting A1 = A2 would create an indirect cycle, so it is rejected and A1 keeps its literal 5 (A2 stays 5).
Input: ([['get', 'Z9']],)
Expected Output: [0]
Explanation: A never-set cell reads as 0.
Input: ([['set', 'A1', '-3'], ['set', 'A2', '=10+20+A1'], ['get', 'A2']],)
Expected Output: ['OK', 'OK', 27]
Explanation: Formulas mix integer literals (including negatives) with references: 10 + 20 + (-3) = 27.
Hints
- Store, per cell, both its raw contents (literal or formula string) and its cached integer value. get_cell just returns the cached value.
- Maintain two maps: deps[cell] = the cells this formula references, and dependents[cell] = the cells whose formulas reference this cell. The dependents map is what lets a set propagate to everything downstream.
- On set_cell, before mutating anything, check for a cycle: gather the transitive dependents of the target cell; if any new reference is in that set, the formula would depend on itself — reject and change nothing.
- After accepting a set, recompute the changed cell and all of its transitive dependents in topological order so each cell is recomputed only after the cells it reads. Remember to count repeated references multiple times and to clear stale dependency edges from the old formula first.