PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches

Quick Overview

This question evaluates probabilistic sampling, numerical stability, and algorithmic implementation skills for weighted selection under floating-point edge cases, and is categorized in Coding & Algorithms for a Machine Learning Engineer role.

  • hard
  • Pinterest
  • Coding & Algorithms
  • Machine Learning Engineer

Sample a string by real-valued scores

Company: Pinterest

Role: Machine Learning Engineer

Category: Coding & Algorithms

Difficulty: hard

Interview Round: Technical Screen

You are given: - A list of strings `texts` (e.g., user comments), length `n`. - A list of floating-point scores `scores` of length `n`, where each score can be any real value, including `-∞` or `+∞`. Design an algorithm/function that **randomly returns one string** from `texts` such that the probability of returning `texts[i]` is proportional to the score distribution. Define the sampling rule explicitly as follows: - If all scores are finite, sample index `i` with probability \[ p_i = \frac{e^{scores[i]}}{\sum_{j=0}^{n-1} e^{scores[j]}}\quad\text{(softmax)}. \] - If one or more scores are `+∞`, return one of the `+∞` items uniformly at random. - If all scores are `-∞` (so all weights are zero), return any item uniformly at random. Requirements: - Time complexity: **O(n)** per sample. - Must be numerically stable for large/small scores. - Clearly state how you handle corner cases (empty input, NaN scores if they appear, etc.).

Quick Answer: This question evaluates probabilistic sampling, numerical stability, and algorithmic implementation skills for weighted selection under floating-point edge cases, and is categorized in Coding & Algorithms for a Machine Learning Engineer role.

Implement `solution(texts, scores, r)`. To make the sampling task deterministic and testable, do not call a random number generator. Instead, treat `r` as a uniform random draw in `[0, 1)` and return the string chosen by inverse-transform sampling. Each `scores[i]` is either a number or one of the strings `'+inf'`, `'-inf'`, or `'nan'`. Sampling rules: 1. If any score is NaN (or an invalid score token appears), return `None`. 2. If `texts` is empty, the lengths differ, or `r` is outside `[0, 1)`, return `None`. 3. If one or more scores are `+inf`, choose uniformly among only those items. 4. Else, if all scores are `-inf`, choose uniformly among all items. 5. Otherwise, sample index `i` with probability proportional to `exp(scores[i])`. Use a numerically stable method, so very large or very small finite scores do not overflow or underflow. In this case, `-inf` contributes probability 0. If the same text appears multiple times, each position is treated as a separate item.

Constraints

  • 0 <= n <= 2 * 10^5
  • Expected time complexity is O(n) per sample
  • Use a numerically stable softmax when no `+inf` is present and not all scores are `-inf`
  • Scores in tests are given as numbers or the strings `'+inf'`, `'-inf'`, and `'nan'`

Examples

Input: (['low', 'mid', 'high'], [0.0, 1.0, 2.0], 0.2)

Expected Output: 'mid'

Explanation: The stable weights are proportional to `exp(-2)`, `exp(-1)`, and `exp(0)`, so the cumulative mass after `'low'` is about 0.090 and after `'mid'` is about 0.335. Since `r = 0.2`, the sample falls on `'mid'`.

Input: (['a', 'b', 'c', 'd'], [1.0, '+inf', -5.0, '+inf'], 0.75)

Expected Output: 'd'

Explanation: There are two `+inf` scores, at indices 1 and 3. The choice must be uniform among those two items. `floor(0.75 * 2) = 1`, so the second `+inf` item is chosen: `'d'`.

Input: (['x', 'y', 'z'], ['-inf', '-inf', '-inf'], 0.99)

Expected Output: 'z'

Explanation: All scores are `-inf`, so all weights are zero and the result must be uniform over all items. `floor(0.99 * 3) = 2`, so `'z'` is returned.

Input: (['u', 'v'], [-1000.0, -1001.0], 0.8)

Expected Output: 'v'

Explanation: A stable softmax uses shifted scores `[0, -1]`, giving weights proportional to `1` and `exp(-1)`. The first item has probability about 0.731, so `r = 0.8` lands on the second item `'v'`.

Input: (['a', 'b'], [0.0, 'nan'], 0.1)

Expected Output: None

Explanation: Any NaN score makes the input invalid under the problem statement, so the function returns `None`.

Input: ([], [], 0.3)

Expected Output: None

Explanation: There is no item to sample from, so the function returns `None`.

Hints

  1. Handle the special `+inf` and all-`-inf` cases before doing any exponentiation.
  2. For finite scores, subtract the maximum finite score before applying `exp`; this keeps the probabilities unchanged but avoids overflow/underflow.
Last updated: May 19, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Assign Pins to Shortest Columns - Pinterest (medium)
  • Design Hierarchical Permission Checks - Pinterest (medium)
  • Implement weighted random choice - Pinterest (medium)
  • Solve five hard algorithm problems - Pinterest
  • Find First Prefix-Matching Word - Pinterest (medium)