PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question tests a candidate's ability to design a dependency-tracking data structure with efficient cache invalidation, drawing on graph traversal and topological ordering concepts. It evaluates practical system design and algorithms skills, particularly managing directed dependency graphs, cycle detection, and O(1) read performance through proactive cache propagation.

  • hard
  • Harvey
  • Coding & Algorithms
  • Software Engineer

Design an In-Memory Spreadsheet Engine

Company: Harvey

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: hard

Interview Round: Technical Screen

# Design an In-Memory Spreadsheet Engine Build a minimal spreadsheet engine, similar to the calculation core of a tool like Excel. The engine stores values in named cells and supports formulas that reference other cells. Reads must return the up-to-date computed value, and reads must be fast (constant-time lookup of a cached value, not a re-computation on every read). ## Cell Addressing A cell is addressed by a string of the form `"<column><row>"`, where the column is one or more uppercase letters (`A`, `B`, ..., `Z`, `AA`, `AB`, ...) and the row is a positive integer (e.g. `"A1"`, `"B2"`, `"AA10"`). You only need to support cell addresses that are passed in via the API — you do not need to materialize an entire grid. ## API Implement a class/object with the following operations: - `set_cell(cell, value)` — Set the contents of `cell`. `value` is a string that is either: - a **literal integer**, e.g. `"10"` or `"-3"`, or - a **formula**, which begins with `=` and is a sum of one or more terms separated by `+`, where each term is either an integer literal or a cell reference. Examples: `"=A1+A2"`, `"=A1+5"`, `"=B2"`, `"=10+20+C3"`. - `get_cell(cell)` — Return the **integer value** of `cell` as computed from its current contents and the current contents of every cell it (transitively) references. If a cell has never been set, treat its value as `0`. `get_cell` must be **O(1)**: it returns a stored, already-computed value rather than re-evaluating the formula tree on every call. Setting a cell must update the cached value of that cell and of every cell whose value transitively depends on it, so that subsequent `get_cell` calls are correct and constant-time. ## Required Behaviors Your engine must handle all of the following: 1. **Literals and basic reads.** After `set_cell("A1", "10")`, `get_cell("A1")` returns `10`. 2. **Formulas with references.** After `set_cell("A1", "10")`, `set_cell("A2", "20")`, `set_cell("A3", "=A1+A2")`, `get_cell("A3")` returns `30`. 3. **Updates propagate.** Continuing the above, after `set_cell("A1", "5")`, `get_cell("A3")` returns `25`. Any cell that transitively depends on a changed cell must reflect the new value. 4. **Nested / chained references.** A formula may reference a cell that is itself a formula, to arbitrary depth. After `set_cell("A1", "1")`, `set_cell("A2", "=A1+1")`, `set_cell("A3", "=A2+1")`, `get_cell("A3")` returns `3`, and updating `A1` propagates through `A2` to `A3`. 5. **Repeated references in one formula.** A formula may reference the same cell more than once, e.g. `set_cell("A3", "=A1+A1")` must count `A1` twice. 6. **Reassigning / removing references.** Re-calling `set_cell` on a cell replaces its previous contents and its previous dependencies. After `set_cell("A3", "=A1+A2")` then `set_cell("A3", "20")`, `A3` no longer depends on `A1` or `A2`, and later changes to `A1`/`A2` must not affect `A3`. Conversely, removing a reference must drop the corresponding dependency edge so stale dependencies are not left behind. ## Constraints & Assumptions - All values are integers (literals and computed results). No floating point. - Formula operators: only `+` between terms. Whitespace handling around terms is up to you; the test cases use no spaces (e.g. `"=A1+A2"`). - A reference to a never-set cell evaluates to `0`. - Cell addresses, formulas, and the number of cells are bounded by available memory; aim for `get_cell` in O(1) and `set_cell` proportional to the number of cells that depend on the cell being changed. ## Cycle Handling The engine must detect dependency **cycles** and reject the offending `set_cell` rather than looping forever or corrupting state. A cycle is created when a `set_cell` would make a cell transitively depend on itself — directly (`set_cell("A1", "=A1")`) or indirectly (`set_cell("A1", "=A2")` while `A2` already, transitively, references `A1`). When a `set_cell` would introduce a cycle, it must: - be rejected (signal an error in an idiomatic way for your language — e.g. raise an exception or return a failure status), and - leave the engine in its **previous valid state**: the cell keeps its prior contents and value, and no dependency edges from the rejected formula are retained. `set_cell` calls that do **not** create a cycle must continue to succeed normally afterward.

Quick Answer: This question tests a candidate's ability to design a dependency-tracking data structure with efficient cache invalidation, drawing on graph traversal and topological ordering concepts. It evaluates practical system design and algorithms skills, particularly managing directed dependency graphs, cycle detection, and O(1) read performance through proactive cache propagation.

Build a minimal spreadsheet engine (the calculation core of a tool like Excel). The engine stores integer values in named cells and supports `+`-only sum formulas that reference other cells. Reads must return the up-to-date computed value in O(1) by returning a cached value, never re-evaluating the formula tree on each read. ## Cell Addressing A cell is addressed by a string `"<column><row>"` where the column is one or more uppercase letters (`A`..`Z`, `AA`, `AB`, ...) and the row is a positive integer (e.g. `"A1"`, `"B2"`, `"AA10"`). You only handle cells passed in via the API; you do not materialize a full grid. ## Operations (driver encoding) You implement `solution(operations)`. `operations` is a list of operations; each operation is a list: - `["set", cell, value]` — set the contents of `cell`. `value` is a string that is either a literal integer (e.g. `"10"`, `"-3"`) or a formula beginning with `=` that is a sum of one or more `+`-separated terms, where each term is an integer literal or a cell reference (e.g. `"=A1+A2"`, `"=A1+5"`, `"=10+20+C3"`). - `["get", cell]` — read the current integer value of `cell`. `solution` returns a list, one entry per operation, in order: - for a `set`, append `"OK"` if it succeeded, or `"ERROR"` if it was rejected because it would create a dependency cycle; - for a `get`, append the cell's cached integer value (a never-set cell is `0`). ## Required behaviors 1. Literals & basic reads: after `set A1 "10"`, `get A1` -> `10`. 2. Formulas with references: after `A1=10`, `A2=20`, `A3="=A1+A2"`, `get A3` -> `30`. 3. Updates propagate: then `A1=5` makes `get A3` -> `25`. Every transitively dependent cell reflects the change. 4. Nested/chained references to arbitrary depth, and updates propagate through the chain. 5. Repeated references in one formula count multiple times: `"=A1+A1"` counts `A1` twice. 6. Reassigning replaces contents AND dependencies: after `A3="=A1+A2"` then `A3="20"`, later changes to `A1`/`A2` must not affect `A3` (stale dependency edges are dropped). ## Cycle handling A `set` that would make a cell transitively depend on itself (directly `"=A1"` in A1, or indirectly) must be rejected: append `"ERROR"`, keep the cell's prior contents/value, and retain none of the rejected formula's dependency edges. Subsequent non-cyclic `set`s must still succeed. Aim for `get` in O(1) (cached value) and `set` proportional to the number of cells that transitively depend on the changed cell.

Constraints

  • All values are integers; no floating point.
  • Formula operators: only '+' between terms; each term is an integer literal or a cell reference.
  • A reference to a never-set cell evaluates to 0.
  • get must be O(1) (return a cached value), not a re-evaluation of the formula tree.
  • A set that would introduce a dependency cycle is rejected ('ERROR') and leaves the engine in its previous valid state.
  • Reassigning a cell replaces its contents and drops its previous dependency edges.

Examples

Input: ([['set', 'A1', '10'], ['get', 'A1']],)

Expected Output: ['OK', 10]

Explanation: Literal set then read returns the stored value 10.

Input: ([['set', 'A1', '10'], ['set', 'A2', '20'], ['set', 'A3', '=A1+A2'], ['get', 'A3']],)

Expected Output: ['OK', 'OK', 'OK', 30]

Explanation: A3 sums its two references 10 + 20 = 30.

Input: ([['set', 'A1', '10'], ['set', 'A2', '20'], ['set', 'A3', '=A1+A2'], ['set', 'A1', '5'], ['get', 'A3']],)

Expected Output: ['OK', 'OK', 'OK', 'OK', 25]

Explanation: Updating A1 to 5 propagates to A3, which recomputes to 5 + 20 = 25.

Input: ([['set', 'A1', '1'], ['set', 'A2', '=A1+1'], ['set', 'A3', '=A2+1'], ['get', 'A3'], ['set', 'A1', '10'], ['get', 'A3']],)

Expected Output: ['OK', 'OK', 'OK', 3, 'OK', 12]

Explanation: Chained references: A3 = A2 + 1 = (A1 + 1) + 1 = 3, then A1 -> 10 propagates through A2 to A3 = (10 + 1) + 1 = 12.

Input: ([['set', 'A1', '7'], ['set', 'A3', '=A1+A1'], ['get', 'A3']],)

Expected Output: ['OK', 'OK', 14]

Explanation: A repeated reference counts twice: 7 + 7 = 14.

Input: ([['set', 'A1', '10'], ['set', 'A2', '20'], ['set', 'A3', '=A1+A2'], ['set', 'A3', '20'], ['set', 'A1', '99'], ['get', 'A3']],)

Expected Output: ['OK', 'OK', 'OK', 'OK', 'OK', 20]

Explanation: Reassigning A3 to the literal 20 drops its old dependencies, so the later A1 -> 99 does not affect A3.

Input: ([['set', 'A1', '=A1'], ['get', 'A1']],)

Expected Output: ['ERROR', 0]

Explanation: A direct self-reference is a cycle: the set is rejected and A1 keeps its prior value (never set -> 0).

Input: ([['set', 'A1', '5'], ['set', 'A2', '=A1'], ['set', 'A1', '=A2'], ['get', 'A1'], ['get', 'A2']],)

Expected Output: ['OK', 'OK', 'ERROR', 5, 5]

Explanation: A2 depends on A1; setting A1 = A2 would create an indirect cycle, so it is rejected and A1 keeps its literal 5 (A2 stays 5).

Input: ([['get', 'Z9']],)

Expected Output: [0]

Explanation: A never-set cell reads as 0.

Input: ([['set', 'A1', '-3'], ['set', 'A2', '=10+20+A1'], ['get', 'A2']],)

Expected Output: ['OK', 'OK', 27]

Explanation: Formulas mix integer literals (including negatives) with references: 10 + 20 + (-3) = 27.

Hints

  1. Store, per cell, both its raw contents (literal or formula string) and its cached integer value. get_cell just returns the cached value.
  2. Maintain two maps: deps[cell] = the cells this formula references, and dependents[cell] = the cells whose formulas reference this cell. The dependents map is what lets a set propagate to everything downstream.
  3. On set_cell, before mutating anything, check for a cycle: gather the transitive dependents of the target cell; if any new reference is in that set, the formula would depend on itself — reject and change nothing.
  4. After accepting a set, recompute the changed cell and all of its transitive dependents in topological order so each cell is recomputed only after the cells it reads. Remember to count repeated references multiple times and to clear stale dependency edges from the old formula first.
Last updated: Jun 24, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • In-Memory File System: Create, Track Duplicates, and List - Harvey (medium)
  • Implement a Spreadsheet With Formulas - Harvey (medium)
  • Implement Spreadsheet Cell Updates - Harvey (hard)
  • Evaluate Symbol Expressions - Harvey (medium)
  • Implement a Cursor-Based Text Editor - Harvey (medium)