Solve Data Structure Challenges with Python Algorithms
Company: Yahoo
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
Quick Answer: This question evaluates proficiency in fundamental data structure manipulation and string-processing algorithms, specifically preserving insertion order during list deduplication and performing iterative overlap-based word merging.
Constraints
- 0 <= n <= 200000 where n is the number of words
- 1 <= len(words[i]) and words[i] consists only of lowercase letters 'a'..'z'
- Sum of lengths of all words <= 200000
- Only a 1-character overlap (last of current with first of next) is used
- Tie-breaker: always pick the earliest valid unused word by original index
Examples
Input:
Expected Output: abctrgn
Input:
Expected Output: abcdef
Input:
Expected Output: axaxb
Input:
Expected Output: a
Input:
Expected Output: hello
Solution
from collections import deque
def merge_words(words):
n = len(words)
if n == 0:
return ""
# Map from starting character to queue of indices in original order
char_to_indices = {chr(c): deque() for c in range(ord('a'), ord('z') + 1)}
for i, w in enumerate(words):
# words[i] are non-empty by constraints
char_to_indices[w[0]].append(i)
used = [False] * n
res = words[0]
used[0] = True
used_count = 1
# Pointer to earliest unused index for fallback
p = 0
while p < n and used[p]:
p += 1
last_char = res[-1]
while used_count < n:
dq = char_to_indices.get(last_char)
chosen = None
if dq is not None:
# Discard used indices lazily
while dq and used[dq[0]]:
dq.popleft()
if dq:
chosen = dq.popleft()
used[chosen] = True
used_count += 1
# Overlap by 1 character
res += words[chosen][1:]
last_char = words[chosen][-1]
continue
# No match: take earliest unused by index
while p < n and used[p]:
p += 1
if p >= n:
break
chosen = p
used[chosen] = True
used_count += 1
res += words[chosen]
last_char = words[chosen][-1]
return res