How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a hard difficulty Coding & Algorithms question, commonly asked during Technical Screen rounds at Pinterest.

What role is this question designed for?

This question is commonly asked for Machine Learning Engineer candidates at Pinterest during technical interviews.

Sample a string by real-valued scores | Pinterest Coding Question

Sample a string by real-valued scores

Company: Pinterest

Role: Machine Learning Engineer

Category: Coding & Algorithms

Difficulty: hard

Interview Round: Technical Screen

You are given: - A list of strings `texts` (e.g., user comments), length `n`. - A list of floating-point scores `scores` of length `n`, where each score can be any real value, including `-∞` or `+∞`. Design an algorithm/function that **randomly returns one string** from `texts` such that the probability of returning `texts[i]` is proportional to the score distribution. Define the sampling rule explicitly as follows: - If all scores are finite, sample index `i` with probability \[ p_i = \frac{e^{scores[i]}}{\sum_{j=0}^{n-1} e^{scores[j]}}\quad\text{(softmax)}. \] - If one or more scores are `+∞`, return one of the `+∞` items uniformly at random. - If all scores are `-∞` (so all weights are zero), return any item uniformly at random. Requirements: - Time complexity: **O(n)** per sample. - Must be numerically stable for large/small scores. - Clearly state how you handle corner cases (empty input, NaN scores if they appear, etc.).

Quick Answer: This question evaluates probabilistic sampling, numerical stability, and algorithmic implementation skills for weighted selection under floating-point edge cases, and is categorized in Coding & Algorithms for a Machine Learning Engineer role.

Implement `solution(texts, scores, r)`. To make the sampling task deterministic and testable, do not call a random number generator. Instead, treat `r` as a uniform random draw in `[0, 1)` and return the string chosen by inverse-transform sampling. Each `scores[i]` is either a number or one of the strings `'+inf'`, `'-inf'`, or `'nan'`. Sampling rules: 1. If any score is NaN (or an invalid score token appears), return `None`. 2. If `texts` is empty, the lengths differ, or `r` is outside `[0, 1)`, return `None`. 3. If one or more scores are `+inf`, choose uniformly among only those items. 4. Else, if all scores are `-inf`, choose uniformly among all items. 5. Otherwise, sample index `i` with probability proportional to `exp(scores[i])`. Use a numerically stable method, so very large or very small finite scores do not overflow or underflow. In this case, `-inf` contributes probability 0. If the same text appears multiple times, each position is treated as a separate item.

Constraints

0 <= n <= 2 * 10^5
Expected time complexity is O(n) per sample
Use a numerically stable softmax when no `+inf` is present and not all scores are `-inf`
Scores in tests are given as numbers or the strings `'+inf'`, `'-inf'`, and `'nan'`

Examples

Input: (['low', 'mid', 'high'], [0.0, 1.0, 2.0], 0.2)

Expected Output: 'mid'

Explanation: The stable weights are proportional to `exp(-2)`, `exp(-1)`, and `exp(0)`, so the cumulative mass after `'low'` is about 0.090 and after `'mid'` is about 0.335. Since `r = 0.2`, the sample falls on `'mid'`.

Input: (['a', 'b', 'c', 'd'], [1.0, '+inf', -5.0, '+inf'], 0.75)

Expected Output: 'd'

Explanation: There are two `+inf` scores, at indices 1 and 3. The choice must be uniform among those two items. `floor(0.75 * 2) = 1`, so the second `+inf` item is chosen: `'d'`.

Input: (['x', 'y', 'z'], ['-inf', '-inf', '-inf'], 0.99)

Expected Output: 'z'

Explanation: All scores are `-inf`, so all weights are zero and the result must be uniform over all items. `floor(0.99 * 3) = 2`, so `'z'` is returned.

Input: (['u', 'v'], [-1000.0, -1001.0], 0.8)

Expected Output: 'v'

Explanation: A stable softmax uses shifted scores `[0, -1]`, giving weights proportional to `1` and `exp(-1)`. The first item has probability about 0.731, so `r = 0.8` lands on the second item `'v'`.

Input: (['a', 'b'], [0.0, 'nan'], 0.1)

Expected Output: None

Explanation: Any NaN score makes the input invalid under the problem statement, so the function returns `None`.

Input: ([], [], 0.3)

Expected Output: None

Explanation: There is no item to sample from, so the function returns `None`.

Hints

Handle the special `+inf` and all-`-inf` cases before doing any exponentiation.
For finite scores, subtract the maximum finite score before applying `exp`; this keeps the probabilities unchanged but avoids overflow/underflow.

Quick Overview

This question evaluates probabilistic sampling, numerical stability, and algorithmic implementation skills for weighted selection under floating-point edge cases, and is categorized in Coding & Algorithms for a Machine Learning Engineer role.