Generate Bigrams Using Python List Comprehension and Zip
Company: PayPal
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
Quick Answer: This question evaluates proficiency in Python programming, particularly list comprehension, string tokenization, and sequence manipulation for generating consecutive word bigrams.
Constraints
- The input is a single string that may contain leading, trailing, or repeated whitespace.
- Words are maximal non-whitespace runs; splitting collapses any whitespace.
- A sentence with zero or one word returns an empty list.
- Each bigram is the two adjacent words joined by exactly one space.
Examples
Input: ("the quick brown fox",)
Expected Output: ["the quick", "quick brown", "brown fox"]
Explanation: Four words produce three consecutive bigrams.
Input: ("hello world",)
Expected Output: ["hello world"]
Explanation: Two words produce exactly one bigram.
Input: ("single",)
Expected Output: []
Explanation: A single word has no adjacent pair, so the result is empty.
Input: ("",)
Expected Output: []
Explanation: An empty string splits into zero words, yielding an empty list.
Input: (" spaced out words ",)
Expected Output: ["spaced out", "out words"]
Explanation: Leading/trailing and repeated whitespace are collapsed by split, leaving three words and two bigrams.
Input: ("a b c d e",)
Expected Output: ["a b", "b c", "c d", "d e"]
Explanation: Five words produce four consecutive bigrams.
Hints
- Split the sentence into a list of words first using whitespace splitting (which collapses repeated spaces).
- Pair each word with the one that follows it: index i with index i+1, for i from 0 to len(words)-2.
- The one-line idiom is [a + ' ' + b for a, b in zip(words[:-1], words[1:])] — zip naturally stops at the shorter sequence, so it produces exactly len(words)-1 pairs.