Implement binary encode/decode with frequency tree
Company: Two Sigma
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: hard
Interview Round: Technical Screen
## Problem
Implement a **binary encoding** and **decoding** scheme for strings based on character frequencies (Huffman-like coding).
You must implement two functions:
1. `encode(s)` → returns `(bitstring, tree)`
2. `decode(bitstring, tree)` → returns the original string
### Encoding rules
Given an input string `s`:
1. Count the frequency of each character.
2. Create a leaf node for each distinct character with weight = its frequency.
3. Build a binary tree by repeatedly combining the **two nodes with the smallest weights** into a new parent node:
- Parent weight = sum of the two child weights.
- The two chosen nodes become the parent’s left and right children.
4. Assign edge labels:
- left edge = `0`
- right edge = `1`
5. The **binary representation** (code) of a character is the sequence of edge labels along the path from the root to that character’s leaf.
6. The encoded output `bitstring` is the concatenation of the codes for each character in `s`.
### Determinism / tie-breaking
To make the encoding deterministic when multiple nodes have equal weight, use this tie-break rule:
- When choosing the two minimum-weight nodes, sort candidates by `(weight, key)`.
- For a **leaf**, `key = character` (lexicographic order).
- For an **internal node**, `key = the smallest character contained in its subtree`.
- When attaching the two chosen nodes as children, the smaller `(weight, key)` becomes the **left** child (gets edge `0`).
### Decoding rules
Given `(bitstring, tree)`, decode by traversing the tree from the root:
- Read bits left-to-right.
- Bit `0` means go to the left child; bit `1` means go to the right child.
- When you reach a leaf, output its character and reset to the root.
### Inputs / outputs
- Input string `s` consists of arbitrary ASCII characters.
- Return the encoded bitstring as a string of `'0'` and `'1'`.
### Edge cases
- If `s` is empty, encoding returns `("", null)` and decoding returns `""`.
- If `s` contains only one distinct character (e.g., `"aaaa"`), assign that character the code `"0"` (so encoding is `"0000"`).
### Constraints
- `0 ≤ |s| ≤ 2 * 10^5`
- Number of distinct characters `σ ≤ 256`
- Aim for `O(|s| + σ log σ)` time and `O(σ)` extra space (excluding output).
Quick Answer: This question evaluates understanding of frequency-based binary encoding and decoding, including deterministic tree construction, tie-breaking rules, and traversal, and tests knowledge of trees, priority queues, greedy algorithms, and bit-level string manipulation in the Coding & Algorithms domain.