How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a medium difficulty Coding & Algorithms question, commonly asked during Technical Screen rounds at Google.

What role is this question designed for?

This question is commonly asked for Machine Learning Engineer candidates at Google during technical interviews.

Implement substring search and weighted sampling

Quick Overview

This question evaluates algorithm design and analysis skills across string processing (efficient substring search) and randomized/data-structure techniques (weighted random sampling with update considerations), testing time/space complexity reasoning and probabilistic thinking for large-scale inputs.

Company: Google

Role: Machine Learning Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

Two coding questions were asked in the onsite. 1. **Substring search**: Given two strings `text` and `pattern`, return the starting index of the first occurrence of `pattern` in `text`, or `-1` if `pattern` does not occur. Assume the input can be large enough that a purely naive `O(n * m)` scan may be too slow. 2. **Weighted random picker**: Design a class initialized with an array of positive integer weights. Implement a method `pickIndex()` that returns index `i` with probability `weights[i] / sum(weights)`. Follow-up: briefly discuss how the design would change if weights were updated frequently.

Quick Answer: This question evaluates algorithm design and analysis skills across string processing (efficient substring search) and randomized/data-structure techniques (weighted random sampling with update considerations), testing time/space complexity reasoning and probabilistic thinking for large-scale inputs.

Part 1: First Occurrence of a Pattern in Text

Given two strings `text` and `pattern`, return the starting index of the first occurrence of `pattern` in `text`. If `pattern` does not appear, return `-1`. Your solution should handle large inputs efficiently, so a purely naive `O(n * m)` scan may be too slow. If `pattern` is an empty string, return `0`.

Constraints

0 <= len(text) <= 200000
0 <= len(pattern) <= 100000
Strings may contain any characters
Return 0 when pattern is empty

Examples

Input: ("abracadabra", "cada")

Expected Output: 4

Explanation: `cada` first appears in `abracadabra` starting at index 4.

Input: ("aaaaa", "bba")

Expected Output: -1

Explanation: The pattern does not occur in the text.

Input: ('', 'a')

Expected Output: -1

Explanation: A non-empty pattern cannot be found in an empty text.

Input: ("abc", "")

Expected Output: 0

Explanation: By definition, the empty pattern matches at index 0.

Input: ("mississippi", "issip")

Expected Output: 4

Explanation: `issip` starts at index 4.

Hints

Think about how to reuse information from previous character matches instead of restarting the pattern from the beginning every time.
Build an auxiliary array for the pattern that tells you the length of the longest proper prefix that is also a suffix up to each position.

Part 2: Deterministic Weighted Picker

You are given an array of positive integer `weights`. Index `i` should be chosen with probability `weights[i] / sum(weights)`. To make the problem deterministic and testable, instead of generating random numbers inside the function, you are also given a list `draws`. Each draw is an integer in the range `[1, sum(weights)]`, representing the random value that would have been generated. For each draw, return the index that would be picked. Formally, build the weighted picker so that draw `d` selects the smallest index `i` such that the prefix sum up to `i` is at least `d`. Follow-up discussion: if weights changed frequently, a Fenwick tree or segment tree would support both updates and picks in `O(log n)`.

Constraints

1 <= len(weights) <= 200000
1 <= weights[i] <= 1000000000
0 <= len(draws) <= 200000
Each draw satisfies 1 <= draw <= sum(weights)

Examples

Input: ([1, 3], [1, 2, 4])

Expected Output: [0, 1, 1]

Explanation: Prefix sums are [1, 4]. Draw 1 maps to index 0, while draws 2 and 4 map to index 1.

Input: ([2, 5, 3], [1, 2, 3, 7, 8, 10])

Expected Output: [0, 0, 1, 1, 2, 2]

Explanation: Prefix sums are [2, 7, 10], so ranges 1-2, 3-7, and 8-10 map to indices 0, 1, and 2.

Input: ([10], [1, 5, 10])

Expected Output: [0, 0, 0]

Explanation: With only one weight, every draw selects index 0.

Input: ([3, 1, 2], [3, 4, 6])

Expected Output: [0, 1, 2]

Explanation: This checks exact prefix boundaries: prefix sums are [3, 4, 6].

Input: ([4, 1, 1], [])

Expected Output: []

Explanation: No draws means no picks to return.

Hints

If you know the prefix sums of the weights, each draw becomes a search for the first prefix sum that is at least that draw value.
For static weights, prefix sums plus binary search are enough. For frequent updates, think about Fenwick trees or segment trees.

Quick Overview