Implement a basic pivot table with totals
Company: Sigmacomputing
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: easy
Interview Round: Technical Screen
You are given a small in-memory table of data and are asked to implement a simplified **pivot table** operation and a printer for it.
Assume the input data is represented as a list of rows, where each row is a mapping from column name (string) to value. For example, in a typical language this could be:
- `List<Map<String, Object>> rows` in Java, or
- `vector<unordered_map<string, Value>> rows` in C++, or
- `List<Dict[str, Any]]` in Python.
Each row contains several **categorical** columns (strings/integers) and at least one **numeric** column (integer/float). You are not required to design the exact type signatures; focus on the core logic and data structures.
You need to implement the following functionality in three steps:
---
### 1. Basic 2D pivot table
Implement a function that takes:
- `rows`: the list of input rows,
- three column names: `c1`, `c2`, `c3`.
Exactly one of these three columns is numeric (the **value column**); the other two are categorical (the **row axis** and **column axis**). You may assume the caller tells you which is which, or you may infer it if that’s convenient; just make it clear in your API.
Your task:
1. Treat the two categorical columns as the X and Y axes of a 2D table (one axis for rows, one for columns).
2. For each combination `(row_key, col_key)` of those two categorical values, compute the **sum** of the numeric column over all input rows that match that pair.
3. Produce a tabular structure in memory (e.g., a nested map: `row_key -> col_key -> sum`).
4. Write a printing function that prints the pivot table in a simple text form, for example:
- First line: a header row with the column-axis keys (plus a label for the row header).
- Subsequent lines: each line represents one row-axis key followed by its cell values, with missing combinations shown as 0.
You may choose any reasonable text layout as long as it is consistent and human-readable.
**Example (conceptual)**
Suppose the raw data has columns `"city"`, `"product"`, and `"sales"`.
If we choose:
- row axis: `"city"`,
- column axis: `"product"`,
- value column: `"sales"`,
then the cell at (city = `"SF"`, product = `"Book"`) contains the sum of `sales` for all rows where `city == "SF"` and `product == "Book"`.
---
### 2. Add row and column totals
Extend your pivot table implementation so that the **printed** table also includes:
1. An extra **"Total" column** on the right, for each row-axis key, showing the sum across all columns for that row.
2. An extra **"Total" row** at the bottom, showing for each column-axis key the sum across all rows.
3. Optionally, the bottom-right cell (intersection of the Total row and Total column) should be the grand total (sum of all values in the table).
Update your printing function to show these totals clearly.
---
### 3. Generalize to multiple columns per axis
Finally, generalize the pivot table so that **each axis can be defined by multiple columns**, rather than exactly one.
Modify your function signature so that it takes:
- `row_key_columns`: a list of column names to be used (together) as the row axis key,
- `col_key_columns`: a list of column names to be used (together) as the column axis key,
- `value_column`: the single numeric column to aggregate.
Requirements:
1. The row key is now a tuple (or concatenation) of one or more columns listed in `row_key_columns`.
2. The column key is a tuple of one or more columns listed in `col_key_columns`.
3. For the interview scenario, you should at least support the case where **three categorical columns in total** are used across the two axes, e.g.:
- two columns on the row axis and one on the column axis, or
- one on the row axis and two on the column axis.
4. For this extended version, you **do not** need to print row/column totals anymore (just the pivoted sums themselves).
5. Update the printing function so that it can display multi-column keys in a reasonable way (for example, join components with a separator, or print hierarchical row headers).
You should:
- Choose appropriate in-memory data structures (e.g., hash maps / ordered maps / sets) to implement the grouping and aggregation.
- Ensure the implementation is correct and handles missing combinations (no rows for a particular `(row_key, col_key)` pair) by outputting 0 for that cell.
- Aim for reasonable time and space complexity for up to tens of thousands of input rows.
Do **not** worry about file I/O, UI, or formatting beyond a simple, text-based table printout.
Quick Answer: This question evaluates a candidate's ability to perform in-memory data aggregation, multi-dimensional grouping, handling missing combinations, and formatted text output for pivot tables including row/column totals.