Implement lossless dictionary encoding and decoding
Company: Amplitude
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
Dictionary encoding is a lossless compression technique for a list of tokens.
You are given an input string consisting of words separated by commas, e.g.
- Input: `"USA,USA,USA,USA,Mexico,Canada,Mexico,Mexico,Mexico"`
Implement two functions:
1. `encode(input_string) -> compressed_string`
- Build a dictionary of unique words in the order of first appearance.
- Replace each word occurrence with its dictionary index (0-based).
- Output a single string in the format:
- `dictionary_part:index_part`
- `dictionary_part` is the unique words joined by commas.
- `index_part` is the indices (as integers) joined by commas.
- Important: The decoder will receive **only** the `compressed_string`; you may not pass any additional data.
2. `decode(compressed_string) -> original_string`
- Reconstruct the exact original comma-separated string.
Example:
- Input: `"USA,USA,USA,USA,Mexico,Canada,Mexico,Mexico,Mexico"`
- Encoded: `"USA,Mexico,Canada:0,0,0,0,1,2,1,1,1"`
- Decoded: `"USA,USA,USA,USA,Mexico,Canada,Mexico,Mexico,Mexico"`
Handle empty input strings appropriately.
Quick Answer: This question evaluates understanding of lossless dictionary-based compression, string tokenization, and mapping data structures for encoding and decoding tokens, within the Coding & Algorithms category.