Sample a string by real-valued scores
Company: Pinterest
Role: Machine Learning Engineer
Category: Coding & Algorithms
Difficulty: hard
Interview Round: Technical Screen
Quick Answer: This question evaluates probabilistic sampling, numerical stability, and algorithmic implementation skills for weighted selection under floating-point edge cases, and is categorized in Coding & Algorithms for a Machine Learning Engineer role.
Constraints
- 0 <= n <= 2 * 10^5
- Expected time complexity is O(n) per sample
- Use a numerically stable softmax when no `+inf` is present and not all scores are `-inf`
- Scores in tests are given as numbers or the strings `'+inf'`, `'-inf'`, and `'nan'`
Examples
Input: (['low', 'mid', 'high'], [0.0, 1.0, 2.0], 0.2)
Expected Output: 'mid'
Explanation: The stable weights are proportional to `exp(-2)`, `exp(-1)`, and `exp(0)`, so the cumulative mass after `'low'` is about 0.090 and after `'mid'` is about 0.335. Since `r = 0.2`, the sample falls on `'mid'`.
Input: (['a', 'b', 'c', 'd'], [1.0, '+inf', -5.0, '+inf'], 0.75)
Expected Output: 'd'
Explanation: There are two `+inf` scores, at indices 1 and 3. The choice must be uniform among those two items. `floor(0.75 * 2) = 1`, so the second `+inf` item is chosen: `'d'`.
Input: (['x', 'y', 'z'], ['-inf', '-inf', '-inf'], 0.99)
Expected Output: 'z'
Explanation: All scores are `-inf`, so all weights are zero and the result must be uniform over all items. `floor(0.99 * 3) = 2`, so `'z'` is returned.
Input: (['u', 'v'], [-1000.0, -1001.0], 0.8)
Expected Output: 'v'
Explanation: A stable softmax uses shifted scores `[0, -1]`, giving weights proportional to `1` and `exp(-1)`. The first item has probability about 0.731, so `r = 0.8` lands on the second item `'v'`.
Input: (['a', 'b'], [0.0, 'nan'], 0.1)
Expected Output: None
Explanation: Any NaN score makes the input invalid under the problem statement, so the function returns `None`.
Input: ([], [], 0.3)
Expected Output: None
Explanation: There is no item to sample from, so the function returns `None`.
Hints
- Handle the special `+inf` and all-`-inf` cases before doing any exponentiation.
- For finite scores, subtract the maximum finite score before applying `exp`; this keeps the probabilities unchanged but avoids overflow/underflow.