Design compressed vector and compute dot product
Company: Google
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: easy
Interview Round: Onsite
## Problem
You are given very long integer vectors where **many adjacent elements repeat** (e.g., long runs of the same value). Storing the full array is memory-expensive.
### Task
1. **Design a compact data structure** to store such a vector more efficiently than a raw `int[]`/`vector<int>`.
2. Using your compact representation, implement a method to compute the **dot product** of two vectors `A` and `B` of equal length `n`:
\[
A \cdot B = \sum_{i=0}^{n-1} A[i] \times B[i]
\]
### Requirements / Constraints
- `n` can be very large (e.g., up to \(10^7\) or more), so avoid full decompression when possible.
- Values fit in 32-bit signed integers; the dot product may require a 64-bit accumulator.
- Your structure should support:
- Building from an input array
- Iterating through the logical sequence (implicitly)
- Computing dot product between two compressed vectors efficiently
### Clarifications to state during interview
- Assume “many repeats” primarily means **runs of identical adjacent values** are common.
- Vectors are dense (not necessarily sparse / not mostly zeros).
### Deliverables
- Describe the data structure and its space complexity.
- Describe the dot product algorithm using the compressed form and its time complexity.
- Provide key edge cases (e.g., negative values, different run boundaries, overflow concerns).
Quick Answer: This question evaluates understanding of data structure design, sequence compression concepts (such as run-length patterns), and numeric algorithm efficiency for computing operations like the dot product, within the Coding & Algorithms domain, and it probes both conceptual understanding and practical application for large-scale data and engineering trade-offs. It is commonly asked to assess the ability to represent long repeating runs compactly and to reason about space/time complexity, iteration semantics, and edge cases such as differing run boundaries and accumulation/overflow concerns without relying on full decompression.
Build run-length encodings of two equal-length vectors and compute their dot product without fully expanding the compressed form.
Constraints
- Vectors have equal length
Examples
Input: ([1, 1, 1, 2, 2, 3], [4, 4, 5, 6, 6, 7])
Expected Output: 58
Input: ([0, 0, 0], [5, 6, 7])
Expected Output: 0
Input: ([-1, -1, 2], [3, 4, 5])
Expected Output: 3
Hints
- Advance through runs from both vectors, consuming the shorter remaining run.