How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a medium difficulty Coding & Algorithms question, commonly asked during Technical Screen rounds at Amplitude.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Amplitude during technical interviews.

Implement lossless dictionary encoding and decoding

Company: Amplitude

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

Dictionary encoding is a lossless compression technique for a list of tokens. You are given an input string consisting of words separated by commas, e.g. - Input: `"USA,USA,USA,USA,Mexico,Canada,Mexico,Mexico,Mexico"` Implement two functions: 1. `encode(input_string) -> compressed_string` - Build a dictionary of unique words in the order of first appearance. - Replace each word occurrence with its dictionary index (0-based). - Output a single string in the format: - `dictionary_part:index_part` - `dictionary_part` is the unique words joined by commas. - `index_part` is the indices (as integers) joined by commas. - Important: The decoder will receive **only** the `compressed_string`; you may not pass any additional data. 2. `decode(compressed_string) -> original_string` - Reconstruct the exact original comma-separated string. Example: - Input: `"USA,USA,USA,USA,Mexico,Canada,Mexico,Mexico,Mexico"` - Encoded: `"USA,Mexico,Canada:0,0,0,0,1,2,1,1,1"` - Decoded: `"USA,USA,USA,USA,Mexico,Canada,Mexico,Mexico,Mexico"` Handle empty input strings appropriately.

Quick Answer: This question evaluates understanding of lossless dictionary-based compression, string tokenization, and mapping data structures for encoding and decoding tokens, within the Coding & Algorithms category.

Dictionary encoding is a lossless compression technique for a list of tokens. You are given an input string of words separated by commas, e.g. `"USA,USA,USA,USA,Mexico,Canada,Mexico,Mexico,Mexico"`. Implement two operations: 1. **encode(input_string) -> compressed_string** - Build a dictionary of unique words in order of first appearance. - Replace each word occurrence with its 0-based dictionary index. - Output a single string `dictionary_part:index_part`, where `dictionary_part` is the unique words joined by commas and `index_part` is the indices joined by commas. - The decoder receives **only** the compressed string — no extra data may be passed. 2. **decode(compressed_string) -> original_string** - Reconstruct the exact original comma-separated string. Example: `"USA,USA,USA,USA,Mexico,Canada,Mexico,Mexico,Mexico"` encodes to `"USA,Mexico,Canada:0,0,0,0,1,2,1,1,1"` and decodes back to the original. **Harness contract:** Implement a single function `solution(input_string)` that performs the round-trip — it must encode the input, then decode the encoded result, and return the two-element list `[encoded_string, decoded_string]`. A correct solution always yields `decoded_string == input_string` (losslessness). Handle the empty input string by returning `["", ""]`.

Constraints

Words are separated by commas; words themselves do not contain commas or a ':' character.
An empty input string encodes to "" and decodes back to "".
The dictionary lists unique words in order of first appearance, indexed from 0.
The decoder may use only the compressed string — no side-channel data.
Encoding must be lossless: decode(encode(x)) == x for every valid input.

Examples

Input: ("USA,USA,USA,USA,Mexico,Canada,Mexico,Mexico,Mexico",)

Expected Output: ['USA,Mexico,Canada:0,0,0,0,1,2,1,1,1', 'USA,USA,USA,USA,Mexico,Canada,Mexico,Mexico,Mexico']

Explanation: Three unique words (USA, Mexico, Canada) get indices 0, 1, 2. The index list mirrors the original order, and decoding rebuilds the exact original string.

Input: ("",)

Expected Output: ['', '']

Explanation: Empty input encodes to an empty compressed string and decodes back to empty.

Input: ("hello",)

Expected Output: ['hello:0', 'hello']

Explanation: A single word becomes dictionary 'hello' with index list '0'.

Input: ("a,b,a,b,a",)

Expected Output: ['a,b:0,1,0,1,0', 'a,b,a,b,a']

Explanation: Two alternating words; the dictionary is 'a,b' and the index list captures the alternation, decoding losslessly.

Input: ("dog,cat,bird,fish",)

Expected Output: ['dog,cat,bird,fish:0,1,2,3', 'dog,cat,bird,fish']

Explanation: All words unique, so the dictionary equals the input and indices count up 0..3.

Input: ("x,x,x",)

Expected Output: ['x:0,0,0', 'x,x,x']

Explanation: A single repeated word collapses the dictionary to 'x' while the index list keeps three zeros to preserve the original length.

Hints

For encode, keep a hash map from word -> assigned index and a parallel list (the dictionary) in first-appearance order. Append str(index) for every word.
Split the compressed string on the FIRST ':' only, so the dictionary part and the index part stay separate even though both contain commas.
In decode, convert each index token to an int and look it up in the dictionary list, then join the looked-up words with commas.
Guard the empty-string case explicitly in both directions before splitting.

Quick Overview

This question evaluates understanding of lossless dictionary-based compression, string tokenization, and mapping data structures for encoding and decoding tokens, within the Coding & Algorithms category.