Implement substring search and weighted sampling
Company: Google
Role: Machine Learning Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
Quick Answer: This question evaluates algorithm design and analysis skills across string processing (efficient substring search) and randomized/data-structure techniques (weighted random sampling with update considerations), testing time/space complexity reasoning and probabilistic thinking for large-scale inputs.
Part 1: First Occurrence of a Pattern in Text
Constraints
- 0 <= len(text) <= 200000
- 0 <= len(pattern) <= 100000
- Strings may contain any characters
- Return 0 when pattern is empty
Examples
Input: ("abracadabra", "cada")
Expected Output: 4
Explanation: `cada` first appears in `abracadabra` starting at index 4.
Input: ("aaaaa", "bba")
Expected Output: -1
Explanation: The pattern does not occur in the text.
Input: ('', 'a')
Expected Output: -1
Explanation: A non-empty pattern cannot be found in an empty text.
Input: ("abc", "")
Expected Output: 0
Explanation: By definition, the empty pattern matches at index 0.
Input: ("mississippi", "issip")
Expected Output: 4
Explanation: `issip` starts at index 4.
Hints
- Think about how to reuse information from previous character matches instead of restarting the pattern from the beginning every time.
- Build an auxiliary array for the pattern that tells you the length of the longest proper prefix that is also a suffix up to each position.
Part 2: Deterministic Weighted Picker
Constraints
- 1 <= len(weights) <= 200000
- 1 <= weights[i] <= 1000000000
- 0 <= len(draws) <= 200000
- Each draw satisfies 1 <= draw <= sum(weights)
Examples
Input: ([1, 3], [1, 2, 4])
Expected Output: [0, 1, 1]
Explanation: Prefix sums are [1, 4]. Draw 1 maps to index 0, while draws 2 and 4 map to index 1.
Input: ([2, 5, 3], [1, 2, 3, 7, 8, 10])
Expected Output: [0, 0, 1, 1, 2, 2]
Explanation: Prefix sums are [2, 7, 10], so ranges 1-2, 3-7, and 8-10 map to indices 0, 1, and 2.
Input: ([10], [1, 5, 10])
Expected Output: [0, 0, 0]
Explanation: With only one weight, every draw selects index 0.
Input: ([3, 1, 2], [3, 4, 6])
Expected Output: [0, 1, 2]
Explanation: This checks exact prefix boundaries: prefix sums are [3, 4, 6].
Input: ([4, 1, 1], [])
Expected Output: []
Explanation: No draws means no picks to return.
Hints
- If you know the prefix sums of the weights, each draw becomes a search for the first prefix sum that is at least that draw value.
- For static weights, prefix sums plus binary search are enough. For frequent updates, think about Fenwick trees or segment trees.