Implement weighted random city and sparse dot product
Company: Meta
Role: Machine Learning Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
## Question 1: Weighted random city picker
You are given a mapping from **city → population** (all populations are positive integers). Implement a random generator that simulates picking **one random person uniformly** from the total population and returns the **city** that person lives in.
### Requirements
- Input: `Map<String, long> populationByCity`.
- Provide an API such as:
- `CityPicker(populationByCity)` (preprocess)
- `String pickCity()` (can be called many times)
- The probability of returning a city must be proportional to its population.
### Constraints (assume)
- Number of cities up to \(10^5\)
- Population values up to \(10^{12}\)
- `pickCity()` may be called many times (e.g., \(10^6\)+)
### Clarifications to handle
- How to handle very large totals (overflow)
- What to do if the map is empty (define behavior)
---
## Question 2: Dot product of large sparse vectors (with follow-up)
Compute the dot product of two **high-dimensional sparse vectors**.
### Input format (typical)
Two sparse vectors `A` and `B`, each represented as a list of non-zero entries:
- `A = [(i1, v1), (i2, v2), ...]`
- `B = [(j1, w1), (j2, w2), ...]`
Where indices are integers (e.g., in `[0, D)`) and values are numeric. Indices may be assumed sorted ascending unless you choose otherwise.
### Output
Return the dot product:
\[
A \cdot B = \sum_k A[k] \times B[k]
\]
### Constraints (assume)
- Dimension \(D\) may be extremely large (e.g., \(10^9\))
- Non-zero counts are much smaller (e.g., \(|A|, |B| \le 10^5\))
### Follow-up: reuse previous result
Suppose you will compute dot products repeatedly and **only a small number of entries change between calls** (e.g., one vector receives point updates, or you repeatedly query dot products with similar vectors). Describe how you would reuse the previous result to avoid recomputing from scratch.
(Do not provide code; explain the approach and complexity tradeoffs.)
Quick Answer: This pair of problems evaluates proficiency in randomized algorithms and numerical scalability for weighted sampling and in sparse linear algebra for computing dot products, testing competencies in data structures, algorithmic complexity, and handling large numeric ranges within the Coding & Algorithms domain relevant to machine learning engineering; the required level combines practical implementation detail with conceptual reasoning about numerical stability and performance trade-offs. They are commonly asked in technical interviews to gauge the ability to design time- and memory-efficient solutions for high-dimensional or high-volume data, reason about overflow and sparsity implications, and consider reuse or incremental updates to avoid redundant computation.