Implement lossless dictionary encoding and decoding
Company: Amplitude
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
Quick Answer: This question evaluates understanding of lossless dictionary-based compression, string tokenization, and mapping data structures for encoding and decoding tokens, within the Coding & Algorithms category.
Constraints
- Words are separated by commas; words themselves do not contain commas or a ':' character.
- An empty input string encodes to "" and decodes back to "".
- The dictionary lists unique words in order of first appearance, indexed from 0.
- The decoder may use only the compressed string — no side-channel data.
- Encoding must be lossless: decode(encode(x)) == x for every valid input.
Examples
Input: ("USA,USA,USA,USA,Mexico,Canada,Mexico,Mexico,Mexico",)
Expected Output: ['USA,Mexico,Canada:0,0,0,0,1,2,1,1,1', 'USA,USA,USA,USA,Mexico,Canada,Mexico,Mexico,Mexico']
Explanation: Three unique words (USA, Mexico, Canada) get indices 0, 1, 2. The index list mirrors the original order, and decoding rebuilds the exact original string.
Input: ("",)
Expected Output: ['', '']
Explanation: Empty input encodes to an empty compressed string and decodes back to empty.
Input: ("hello",)
Expected Output: ['hello:0', 'hello']
Explanation: A single word becomes dictionary 'hello' with index list '0'.
Input: ("a,b,a,b,a",)
Expected Output: ['a,b:0,1,0,1,0', 'a,b,a,b,a']
Explanation: Two alternating words; the dictionary is 'a,b' and the index list captures the alternation, decoding losslessly.
Input: ("dog,cat,bird,fish",)
Expected Output: ['dog,cat,bird,fish:0,1,2,3', 'dog,cat,bird,fish']
Explanation: All words unique, so the dictionary equals the input and indices count up 0..3.
Input: ("x,x,x",)
Expected Output: ['x:0,0,0', 'x,x,x']
Explanation: A single repeated word collapses the dictionary to 'x' while the index list keeps three zeros to preserve the original length.
Hints
- For encode, keep a hash map from word -> assigned index and a parallel list (the dictionary) in first-appearance order. Append str(index) for every word.
- Split the compressed string on the FIRST ':' only, so the dictionary part and the index part stay separate even though both contain commas.
- In decode, convert each index token to an int and look it up in the dictionary list, then join the looked-up words with commas.
- Guard the empty-string case explicitly in both directions before splitting.