Solve two string DP/hash problems
Company: Amazon
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: easy
Interview Round: Technical Screen
Quick Answer: This question set evaluates string-processing and algorithmic problem-solving skills, including encoding and transformation for unique representations and dynamic programming, memoization, and backtracking for enumerating all valid segmentations.
Part 1: Unique Morse Code Transformations
Constraints
- `0 <= len(words) <= 1000`
- `0 <= len(words[i]) <= 20`
- Each character in every word is a lowercase English letter from `a` to `z`.
Examples
Input: (['gin', 'zen', 'gig', 'msg'],)
Expected Output: 2
Explanation: `'gin'` and `'zen'` map to the same Morse string, and `'gig'` and `'msg'` map to another one.
Input: (['a'],)
Expected Output: 1
Explanation: A single word always contributes exactly one translation.
Input: ([],)
Expected Output: 0
Explanation: No words means there are no translations.
Input: (['cab', 'abc', 'bca', 'cab'],)
Expected Output: 3
Explanation: There are three distinct translations; the repeated word `cab` does not add a new one.
Solution
def solution(words):
morse = ['.-', '-...', '-.-.', '-..', '.', '..-.', '--.', '....', '..', '.---', '-.-', '.-..', '--', '-.', '---', '.--.', '--.-', '.-.', '...', '-', '..-', '...-', '.--', '-..-', '-.--', '--..']
seen = set()
for word in words:
translated = []
for ch in word:
translated.append(morse[ord(ch) - ord('a')])
seen.add(''.join(translated))
return len(seen)
Time complexity: O(T), where T is the total number of characters across all words.. Space complexity: O(U), where U is the total size of the distinct translated strings stored in the set..
Hints
- Store the 26 Morse encodings in an array so letter `c` can be found by index `ord('c') - ord('a')`.
- A hash set is enough to track unique translated strings.
Part 2: Word Break II (All Segmentations)
Constraints
- `0 <= len(s) <= 20`
- `0 <= len(wordDict) <= 1000`
- Each word in `wordDict` has length at least 1.
- All strings contain only lowercase English letters.
- The same dictionary word may be reused multiple times.
Examples
Input: ('catsanddog', ['cat', 'cats', 'and', 'sand', 'dog'])
Expected Output: ['cat sand dog', 'cats and dog']
Explanation: There are two valid ways to split the string.
Input: ('pineapplepenapple', ['apple', 'pen', 'applepen', 'pine', 'pineapple'])
Expected Output: ['pine apple pen apple', 'pine applepen apple', 'pineapple pen apple']
Explanation: All three segmentations are valid and reuse of dictionary words is allowed.
Input: ('catsandog', ['cats', 'dog', 'sand', 'and', 'cat'])
Expected Output: []
Explanation: No complete segmentation exists.
Input: ('', ['a', 'b'])
Expected Output: []
Explanation: This problem defines the empty input string to return no sentences.
Input: ('aaaa', ['a', 'aa'])
Expected Output: ['a a a a', 'a a aa', 'a aa a', 'aa a a', 'aa aa']
Explanation: Multiple valid segmentations exist; all must be returned.
Solution
def solution(s, wordDict):
if not s:
return []
word_set = set(wordDict)
if not word_set:
return []
n = len(s)
max_len = max((len(word) for word in word_set), default=0)
good = [False] * (n + 1)
good[n] = True
for i in range(n - 1, -1, -1):
upper = min(n, i + max_len)
for end in range(i + 1, upper + 1):
if s[i:end] in word_set and good[end]:
good[i] = True
break
if not good[0]:
return []
memo = {}
def dfs(i):
if i == n:
return ['']
if i in memo:
return memo[i]
sentences = []
upper = min(n, i + max_len)
for end in range(i + 1, upper + 1):
word = s[i:end]
if word in word_set and good[end]:
for tail in dfs(end):
if tail:
sentences.append(word + ' ' + tail)
else:
sentences.append(word)
memo[i] = sentences
return sentences
return dfs(0)
Time complexity: O(n * L + S), where `n` is the length of `s`, `L` is the maximum dictionary word length, and `S` is the total size of all returned sentences.. Space complexity: O(n + S) for the memoization, reachability table, recursion stack, and returned sentences..
Hints
- Think in terms of suffixes: what sentences can be formed starting at index `i`?
- Use memoization so each starting index is solved once instead of recomputing the same suffix repeatedly.