PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates understanding of lossless dictionary-based compression, string tokenization, and mapping data structures for encoding and decoding tokens, within the Coding & Algorithms category.

  • medium
  • Amplitude
  • Coding & Algorithms
  • Software Engineer

Implement lossless dictionary encoding and decoding

Company: Amplitude

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

Dictionary encoding is a lossless compression technique for a list of tokens. You are given an input string consisting of words separated by commas, e.g. - Input: `"USA,USA,USA,USA,Mexico,Canada,Mexico,Mexico,Mexico"` Implement two functions: 1. `encode(input_string) -> compressed_string` - Build a dictionary of unique words in the order of first appearance. - Replace each word occurrence with its dictionary index (0-based). - Output a single string in the format: - `dictionary_part:index_part` - `dictionary_part` is the unique words joined by commas. - `index_part` is the indices (as integers) joined by commas. - Important: The decoder will receive **only** the `compressed_string`; you may not pass any additional data. 2. `decode(compressed_string) -> original_string` - Reconstruct the exact original comma-separated string. Example: - Input: `"USA,USA,USA,USA,Mexico,Canada,Mexico,Mexico,Mexico"` - Encoded: `"USA,Mexico,Canada:0,0,0,0,1,2,1,1,1"` - Decoded: `"USA,USA,USA,USA,Mexico,Canada,Mexico,Mexico,Mexico"` Handle empty input strings appropriately.

Quick Answer: This question evaluates understanding of lossless dictionary-based compression, string tokenization, and mapping data structures for encoding and decoding tokens, within the Coding & Algorithms category.

Dictionary encoding is a lossless compression technique for a list of tokens. You are given an input string of words separated by commas, e.g. `"USA,USA,USA,USA,Mexico,Canada,Mexico,Mexico,Mexico"`. Implement two operations: 1. **encode(input_string) -> compressed_string** - Build a dictionary of unique words in order of first appearance. - Replace each word occurrence with its 0-based dictionary index. - Output a single string `dictionary_part:index_part`, where `dictionary_part` is the unique words joined by commas and `index_part` is the indices joined by commas. - The decoder receives **only** the compressed string — no extra data may be passed. 2. **decode(compressed_string) -> original_string** - Reconstruct the exact original comma-separated string. Example: `"USA,USA,USA,USA,Mexico,Canada,Mexico,Mexico,Mexico"` encodes to `"USA,Mexico,Canada:0,0,0,0,1,2,1,1,1"` and decodes back to the original. **Harness contract:** Implement a single function `solution(input_string)` that performs the round-trip — it must encode the input, then decode the encoded result, and return the two-element list `[encoded_string, decoded_string]`. A correct solution always yields `decoded_string == input_string` (losslessness). Handle the empty input string by returning `["", ""]`.

Constraints

  • Words are separated by commas; words themselves do not contain commas or a ':' character.
  • An empty input string encodes to "" and decodes back to "".
  • The dictionary lists unique words in order of first appearance, indexed from 0.
  • The decoder may use only the compressed string — no side-channel data.
  • Encoding must be lossless: decode(encode(x)) == x for every valid input.

Examples

Input: ("USA,USA,USA,USA,Mexico,Canada,Mexico,Mexico,Mexico",)

Expected Output: ['USA,Mexico,Canada:0,0,0,0,1,2,1,1,1', 'USA,USA,USA,USA,Mexico,Canada,Mexico,Mexico,Mexico']

Explanation: Three unique words (USA, Mexico, Canada) get indices 0, 1, 2. The index list mirrors the original order, and decoding rebuilds the exact original string.

Input: ("",)

Expected Output: ['', '']

Explanation: Empty input encodes to an empty compressed string and decodes back to empty.

Input: ("hello",)

Expected Output: ['hello:0', 'hello']

Explanation: A single word becomes dictionary 'hello' with index list '0'.

Input: ("a,b,a,b,a",)

Expected Output: ['a,b:0,1,0,1,0', 'a,b,a,b,a']

Explanation: Two alternating words; the dictionary is 'a,b' and the index list captures the alternation, decoding losslessly.

Input: ("dog,cat,bird,fish",)

Expected Output: ['dog,cat,bird,fish:0,1,2,3', 'dog,cat,bird,fish']

Explanation: All words unique, so the dictionary equals the input and indices count up 0..3.

Input: ("x,x,x",)

Expected Output: ['x:0,0,0', 'x,x,x']

Explanation: A single repeated word collapses the dictionary to 'x' while the index list keeps three zeros to preserve the original length.

Hints

  1. For encode, keep a hash map from word -> assigned index and a parallel list (the dictionary) in first-appearance order. Append str(index) for every word.
  2. Split the compressed string on the FIRST ':' only, so the dictionary part and the index part stay separate even though both contain commas.
  3. In decode, convert each index token to an int and look it up in the dictionary list, then join the looked-up words with commas.
  4. Guard the empty-string case explicitly in both directions before splitting.
Last updated: Jun 26, 2026

Related Coding Questions

  • Implement Employee Validation and Snake - Amplitude (medium)
  • Swap kth-from-end node with head - Amplitude (medium)

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.