Implement sparse vector dot product and cosine similarity
Company: Apple
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
Quick Answer: This question evaluates implementation of sparse data structures and efficient algorithms for high-dimensional vector operations, specifically the dot product and cosine similarity, testing knowledge of sparsity-aware computation and linear algebra concepts such as vector norms.
Part 1: Sparse Vector Dot Product
Constraints
- 0 <= dim1, dim2 <= 10^9
- 0 <= len(values1), len(values2) <= 200000
- All keys in values1 are integers in [0, dim1 - 1].
- All keys in values2 are integers in [0, dim2 - 1].
- Dictionary values are floats and represent non-zero vector entries.
- The solution must not iterate over all dim positions.
Examples
Input: (5, {0: 1.5, 3: 2.0}, 5, {1: 10.0, 3: 4.0})
Expected Output: 8.0
Explanation: Only index 3 overlaps: 2.0 * 4.0 = 8.0.
Input: (4, {0: 2.0, 1: 3.0}, 4, {2: 10.0})
Expected Output: 0.0
Explanation: The vectors have no overlapping non-zero indices.
Input: (10, {}, 10, {0: 5.0})
Expected Output: 0.0
Explanation: One vector has no non-zero entries, so the dot product is 0.
Input: (3, {0: 1.0}, 4, {0: 1.0})
Expected Output: None
Explanation: The dimensions differ, so the function returns None.
Input: (4, {0: -2.0, 2: 3.0}, 4, {0: 4.0, 2: -1.0, 3: 5.0})
Expected Output: -11.0
Explanation: The dot product is (-2.0 * 4.0) + (3.0 * -1.0) = -11.0.
Hints
- Only indices that appear in both sparse dictionaries can contribute to the dot product.
- Iterate over the dictionary with fewer non-zero entries and look up each index in the other dictionary.
Part 2: Sparse Vector Cosine Similarity
Constraints
- 0 <= dim1, dim2 <= 10^9
- 0 <= len(values1), len(values2) <= 200000
- All keys in values1 are integers in [0, dim1 - 1].
- All keys in values2 are integers in [0, dim2 - 1].
- Dictionary values are floats and represent non-zero vector entries.
- The solution must not iterate over all dim positions.
Examples
Input: (3, {0: 3.0, 1: 4.0}, 3, {1: 5.0})
Expected Output: 0.8
Explanation: Dot product is 20. Norms are 5 and 5, so cosine similarity is 20 / 25 = 0.8.
Input: (4, {0: 2.0}, 4, {3: 7.0})
Expected Output: 0.0
Explanation: The vectors are orthogonal because they have no overlapping non-zero indices.
Input: (5, {}, 5, {2: 7.0})
Expected Output: 0.0
Explanation: The first vector has zero norm, so the specified return value is 0.0.
Input: (2, {0: 1.0}, 3, {0: 1.0})
Expected Output: None
Explanation: The dimensions differ, so the function returns None.
Input: (2, {0: 1.0, 1: 2.0}, 2, {0: -2.0, 1: -4.0})
Expected Output: -1.0
Explanation: The second vector is a negative scalar multiple of the first, so the cosine similarity is -1.0.
Hints
- Compute the dot product using only overlapping non-zero indices.
- The squared norm of a sparse vector is the sum of value * value over only its stored entries.