PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates understanding of frequency-based binary encoding and decoding, including deterministic tree construction, tie-breaking rules, and traversal, and tests knowledge of trees, priority queues, greedy algorithms, and bit-level string manipulation in the Coding & Algorithms domain.

  • hard
  • Two Sigma
  • Coding & Algorithms
  • Software Engineer

Implement binary encode/decode with frequency tree

Company: Two Sigma

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: hard

Interview Round: Technical Screen

## Problem Implement a **binary encoding** and **decoding** scheme for strings based on character frequencies (Huffman-like coding). You must implement two functions: 1. `encode(s)` → returns `(bitstring, tree)` 2. `decode(bitstring, tree)` → returns the original string ### Encoding rules Given an input string `s`: 1. Count the frequency of each character. 2. Create a leaf node for each distinct character with weight = its frequency. 3. Build a binary tree by repeatedly combining the **two nodes with the smallest weights** into a new parent node: - Parent weight = sum of the two child weights. - The two chosen nodes become the parent’s left and right children. 4. Assign edge labels: - left edge = `0` - right edge = `1` 5. The **binary representation** (code) of a character is the sequence of edge labels along the path from the root to that character’s leaf. 6. The encoded output `bitstring` is the concatenation of the codes for each character in `s`. ### Determinism / tie-breaking To make the encoding deterministic when multiple nodes have equal weight, use this tie-break rule: - When choosing the two minimum-weight nodes, sort candidates by `(weight, key)`. - For a **leaf**, `key = character` (lexicographic order). - For an **internal node**, `key = the smallest character contained in its subtree`. - When attaching the two chosen nodes as children, the smaller `(weight, key)` becomes the **left** child (gets edge `0`). ### Decoding rules Given `(bitstring, tree)`, decode by traversing the tree from the root: - Read bits left-to-right. - Bit `0` means go to the left child; bit `1` means go to the right child. - When you reach a leaf, output its character and reset to the root. ### Inputs / outputs - Input string `s` consists of arbitrary ASCII characters. - Return the encoded bitstring as a string of `'0'` and `'1'`. ### Edge cases - If `s` is empty, encoding returns `("", null)` and decoding returns `""`. - If `s` contains only one distinct character (e.g., `"aaaa"`), assign that character the code `"0"` (so encoding is `"0000"`). ### Constraints - `0 ≤ |s| ≤ 2 * 10^5` - Number of distinct characters `σ ≤ 256` - Aim for `O(|s| + σ log σ)` time and `O(σ)` extra space (excluding output).

Quick Answer: This question evaluates understanding of frequency-based binary encoding and decoding, including deterministic tree construction, tie-breaking rules, and traversal, and tests knowledge of trees, priority queues, greedy algorithms, and bit-level string manipulation in the Coding & Algorithms domain.

Implement a deterministic Huffman-like binary encoding and decoding scheme for strings. For this platform, implement solution(s), which should build the frequency tree, encode s into a bitstring, decode that bitstring using the same tree, and return [bitstring, decoded_string]. To build the tree, count character frequencies, create one leaf per distinct character, then repeatedly combine the two nodes with the smallest (weight, key), where a leaf key is its character and an internal-node key is the smallest character in its subtree. The smaller chosen node becomes the left child and receives edge 0; the other becomes the right child and receives edge 1. A character's code is the path from root to its leaf. If s is empty, the encoded bitstring is empty and the decoded string is empty. If s has only one distinct character, that character's code is 0.

Constraints

  • 0 <= len(s) <= 2 * 10^5
  • Number of distinct characters sigma <= 256
  • Characters are arbitrary ASCII characters
  • Tie-breaking must use (weight, key), where key is the smallest character in the node's subtree
  • Aim for O(|s| + sigma log sigma) time excluding output size

Examples

Input: ("",)

Expected Output: ["", ""]

Explanation: The empty string produces an empty encoded bitstring and decodes back to the empty string.

Input: ("aaaa",)

Expected Output: ["0000", "aaaa"]

Explanation: There is only one distinct character, so 'a' is assigned code '0'. Four occurrences encode to '0000'.

Input: ("ab",)

Expected Output: ["01", "ab"]

Explanation: 'a' and 'b' have equal frequency. Lexicographic tie-breaking puts 'a' on the left with code 0 and 'b' on the right with code 1.

Input: ("abbccc",)

Expected Output: ["000101111", "abbccc"]

Explanation: Frequencies are a:1, b:2, c:3. Combine a and b first; then combine that internal node with c. Codes are a=00, b=01, c=1.

Input: ("abcabc",)

Expected Output: ["1011010110", "abcabc"]

Explanation: All characters have frequency 2. Combine a and b first into an internal node of weight 4; c has weight 2 and becomes the left child of the root. Codes are c=0, a=10, b=11.

Input: ("A!A!",)

Expected Output: ["1010", "A!A!"]

Explanation: '!' and 'A' both have frequency 2. ASCII lexicographic order puts '!' before 'A', so !=0 and A=1.

Input: ("aaabbc",)

Expected Output: ["000111110", "aaabbc"]

Explanation: Frequencies are a:3, b:2, c:1. First combine c and b into an internal node with key b, then tie-break against a by key. Codes are a=0, c=10, b=11.

Hints

  1. Use a min-heap ordered by (weight, key) to repeatedly choose the two nodes to combine.
  2. After the tree is built, run a DFS to create a character-to-code map. Remember to special-case the empty string and the single-distinct-character tree.
Last updated: Jun 17, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Implement Price-Time Order Matching - Two Sigma (medium)
  • Compute Piecewise Linear Interpolation - Two Sigma (medium)
  • Implement an In-Memory Database - Two Sigma (hard)
  • Merge two sorted linked lists - Two Sigma (hard)
  • Merge Two Sorted Lists - Two Sigma (hard)