First Word Matching Each Prefix Query
Company: Pinterest
Role: Machine Learning Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Onsite
You are given a list of `words` (the dictionary) and a list of `queries`, where each query is a string **prefix**. For each query, return the index (0-based, into the `words` list) of the **first** word in `words` that has the query string as a prefix. If no word in `words` has the query as a prefix, return `-1` for that query.
A string `p` is a prefix of a word `w` if `w` starts with `p` (and `p` may equal `w`). The empty string is a prefix of every word.
### Constraints & Assumptions
- `1 <= len(words) <= 10^4`.
- `1 <= len(queries) <= 10^4`.
- Each word and each query consists of lowercase English letters only.
- The total number of characters across all words is at most $10^5$; the same bound holds for all queries combined.
- "First word" means the earliest position in the original `words` ordering (do not sort the words; the index returned must be into the input list as given).
- A word that exactly equals the query counts as containing the query as a prefix.
### Input / Output Format
- Input: `words: List[str]`, `queries: List[str]`.
- Output: `List[int]` of the same length as `queries`; the `i`-th element is the index of the first word having `queries[i]` as a prefix, or `-1`.
### Examples
```
words = ["a", "apple", "appz", "b"]
queries = ["ap"]
output = [1] # "apple" at index 1 is the first word starting with "ap"
```
```
words = ["a", "apple", "appz", "b"]
queries = ["app", "b", "c", "a"]
output = [1, 3, -1, 0]
# "app" -> "apple" (index 1); "b" -> "b" (index 3);
# "c" -> no word starts with "c" (-1); "a" -> "a" (index 0)
```
Quick Answer: This question evaluates a candidate's grasp of string data structures and efficient prefix-matching techniques, core competencies in coding and algorithms interviews. It tests practical application of trie-based or hash-map indexing approaches to handle large-scale batch lookup problems under tight complexity constraints.
You are given a list of `words` (the dictionary) and a list of `queries`, where each query is a string **prefix**. For each query, return the index (0-based, into the `words` list) of the **first** word in `words` that has the query string as a prefix. If no word in `words` has the query as a prefix, return `-1` for that query.
A string `p` is a prefix of a word `w` if `w` starts with `p` (and `p` may equal `w`). The empty string is a prefix of every word.
**Constraints**
- `1 <= len(words) <= 10^4`.
- `1 <= len(queries) <= 10^4`.
- Each word and each query consists of lowercase English letters only.
- The total number of characters across all words is at most 10^5; the same bound holds for all queries combined.
- "First word" means the earliest position in the original `words` ordering (do not sort the words; the index returned must be into the input list as given).
- A word that exactly equals the query counts as containing the query as a prefix.
**Input / Output**
- Input: `words: List[str]`, `queries: List[str]`.
- Output: `List[int]` of the same length as `queries`; the `i`-th element is the index of the first word having `queries[i]` as a prefix, or `-1`.
**Example 1**
```
words = ["a", "apple", "appz", "b"]
queries = ["ap"]
output = [1] # "apple" at index 1 is the first word starting with "ap"
```
**Example 2**
```
words = ["a", "apple", "appz", "b"]
queries = ["app", "b", "c", "a"]
output = [1, 3, -1, 0]
# "app" -> "apple" (index 1); "b" -> "b" (index 3);
# "c" -> no word starts with "c" (-1); "a" -> "a" (index 0)
```
Constraints
- 1 <= len(words) <= 10^4
- 1 <= len(queries) <= 10^4
- Each word and each query consists of lowercase English letters only
- Total characters across all words <= 10^5; same bound for all queries combined
- Return the index into the original (unsorted) words list
- A word equal to the query counts as a prefix match; the empty query matches the first word (index 0)
Examples
Input: (["a", "apple", "appz", "b"], ["ap"])
Expected Output: [1]
Explanation: "apple" (index 1) is the first word starting with "ap".
Input: (["a", "apple", "appz", "b"], ["app", "b", "c", "a"])
Expected Output: [1, 3, -1, 0]
Explanation: "app"->"apple"(1); "b"->"b"(3); "c"-> no word starts with c (-1); "a"->"a"(0).
Input: (["apple"], ["apple", "appl", "applex", ""])
Expected Output: [0, 0, -1, 0]
Explanation: An exact match counts (apple at 0); a strict prefix matches; a query longer than the word (applex) cannot match; the empty query matches index 0.
Input: (["dog", "deer", "deal"], ["de", "d", "dea", "deal", "x"])
Expected Output: [1, 0, 2, 2, -1]
Explanation: "de"->"deer"(1); "d"->"dog"(0); "dea"-> "deer" does NOT start with dea so the first match is "deal"(2); "deal"->(2); "x"->none(-1).
Input: (["abc"], ["z"])
Expected Output: [-1]
Explanation: No word starts with "z".
Input: (["banana", "band", "bandana", "can"], ["ban", "band", "bandan", "c", "ca", "cat"])
Expected Output: [0, 1, 2, 3, 3, -1]
Explanation: "ban"->"banana"(0); "band"->"band"(1); "bandan"->"bandana"(2); "c" and "ca"->"can"(3); "cat"->none(-1).
Hints
- Iterate words in their original order and return the index of the FIRST word whose `startswith(query)` is true — break as soon as you find it so you keep the earliest index.
- Python's `str.startswith`, Java's `String.startsWith`, C++ `rfind(q, 0) == 0` (or compare(0, q.size(), q)), and JS `String.startsWith` all test the prefix condition directly.
- Edge cases: an empty query is a prefix of every word, so it returns index 0; a query longer than a word can never be a prefix of it.
- For better asymptotics, build a trie from the words and, at each node, store the minimum word index that passes through it; a query then walks the trie in O(len(query)) and reads off the stored index (or -1 if the path is missing).