Implement string matching with follow-up
Company: DoorDash
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
Quick Answer: A DoorDash software-engineer coding screen on string matching: implement firstIndex(T, P) in O(|T| + |P|) time and O(|P|) space using KMP, then handle follow-ups on naive vs KMP vs Rabin–Karp, Unicode and very long inputs, edge cases, wildcard ('?'/'*') matching, and simultaneous multi-pattern search via Aho–Corasick. Includes corrected, complete solutions and complexity trade-offs.
First Occurrence of a Pattern (KMP)
Constraints
- 0 <= |t|, |p|
- Characters are compared by code point (decode to code points for Unicode-correct indexing).
- An empty pattern returns 0.
- If |p| > |t|, return -1.
Examples
Input: ("abxabcabcaby", "abcaby")
Expected Output: 6
Explanation: The pattern 'abcaby' first appears starting at index 6.
Input: ("hello world", "world")
Expected Output: 6
Explanation: 'world' begins at index 6.
Input: ("aaaaaa", "aaaa")
Expected Output: 0
Explanation: Highly repetitive case; KMP stays linear and reports the earliest match at index 0.
Input: ("abcabc", "abcd")
Expected Output: -1
Explanation: No occurrence of 'abcd'.
Input: ("", "a")
Expected Output: -1
Explanation: Empty text cannot contain a non-empty pattern.
Input: ("abc", "")
Expected Output: 0
Explanation: Empty pattern matches at index 0 by convention.
Input: ("a", "abc")
Expected Output: -1
Explanation: Pattern longer than text -> -1.
Input: ("mississippi", "issip")
Expected Output: 4
Explanation: 'issip' starts at index 4.
Input: ("aaab", "aaab")
Expected Output: 0
Explanation: Pattern equals the whole text; match at index 0.
Hints
- A naive scan is O(|t|·|p|) worst case and blows up on repetitive inputs like 'aaaa' in 'aaaaaa'. You need the text pointer to never move backward.
- Precompute the LPS (longest proper prefix that is also a suffix) array of the pattern in O(|p|). On a mismatch at pattern position k, fall back to lps[k-1] instead of restarting.
- Handle the conventions first: empty pattern -> 0, and |p| > |t| -> -1, before building the table.
Wildcard Pattern Matching (? and *)
Constraints
- 0 <= |t|, |p|
- p may contain '?' and '*' wildcards; all other characters are literal.
- Matching is anchored: the whole pattern must match the whole text.
- A run of leading '*' can match the empty prefix.
Examples
Input: ("abcde", "a?c*")
Expected Output: True
Explanation: 'a' literal, '?'=b, 'c' literal, '*'=de.
Input: ("abcde", "a*e")
Expected Output: True
Explanation: '*' absorbs 'bcd' between the anchors 'a' and 'e'.
Input: ("abcde", "a*f")
Expected Output: False
Explanation: Text does not end with 'f'.
Input: ("", "*")
Expected Output: True
Explanation: '*' matches the empty string.
Input: ("", "")
Expected Output: True
Explanation: Empty matches empty.
Input: ("a", "")
Expected Output: False
Explanation: Empty pattern cannot cover a non-empty text under anchored matching.
Input: ("abc", "???")
Expected Output: True
Explanation: Three '?' match three characters exactly.
Input: ("abc", "??")
Expected Output: False
Explanation: Two '?' cannot cover three characters.
Input: ("aaaa", "*a*")
Expected Output: True
Explanation: Stars on both sides plus a required 'a' in between.
Input: ("xyz", "*")
Expected Output: True
Explanation: A single '*' matches everything.
Input: ("mississippi", "m*ss*p?")
Expected Output: True
Explanation: 'm', '*'->'i', 'ss', '*'->'issi', 'p', '?'->'i'... the stars absorb the spans so the whole string matches.
Hints
- '*' makes a single linear scan insufficient — model it with DP: dp[i][j] = does p[0..j) match t[0..i).
- Base row: dp[0][j] is True only while every pattern char so far is '*' (each '*' can match the empty string).
- For '*': dp[i][j] = dp[i-1][j] (star consumes t[i-1]) OR dp[i][j-1] (star matches empty). For '?' or a literal match: dp[i][j] = dp[i-1][j-1].
Multi-Pattern Search (Aho–Corasick)
Constraints
- 0 <= number of patterns k
- Patterns may repeat; each repeated pattern gets its own result entry.
- Overlapping occurrences are all reported.
- An empty pattern contributes an empty list.
Examples
Input: ("ahishers", ["he", "she", "his", "hers"])
Expected Output: [[4], [3], [1], [4]]
Explanation: Classic Aho–Corasick example: 'he'@4, 'she'@3, 'his'@1, 'hers'@4 (overlapping 'he'/'hers' share the same scan).
Input: ("aaaa", ["a", "aa"])
Expected Output: [[0, 1, 2, 3], [0, 1, 2]]
Explanation: Overlapping matches: 'a' at every index, 'aa' at 0,1,2.
Input: ("abcabc", ["abc", "bc", "d"])
Expected Output: [[0, 3], [1, 4], []]
Explanation: 'abc'@0,3; 'bc'@1,4; 'd' never occurs.
Input: ("xyz", ["a", "b"])
Expected Output: [[], []]
Explanation: No pattern occurs.
Input: ("", ["a"])
Expected Output: [[]]
Explanation: Empty text yields no matches.
Input: ("aaa", ["a", "a"])
Expected Output: [[0, 1, 2], [0, 1, 2]]
Explanation: Duplicate patterns each get their own identical result list.
Input: ("mississippi", ["issi", "ssip", "ppi"])
Expected Output: [[1, 4], [5], [8]]
Explanation: 'issi'@1,4 (overlapping); 'ssip'@5; 'ppi'@8.
Hints
- Running KMP k separate times is O(k·|text| + Σ|Pi|). To scan the text once, generalize KMP's failure function to a trie over all patterns.
- Build a trie of all patterns, then add failure links via BFS (the longest proper suffix of the current node that is also a prefix in the trie), plus output links so one node can report multiple patterns ending there.
- Merge each node's output list with its failure node's output list during the BFS so a single position emits every pattern that ends there, including shorter suffixes.