PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/Product / Decision Making/Goldman Sachs

Word Compression Algorithm

Last updated: Mar 29, 2026

Quick Overview

Practice designing run-length encoding compression and decompression with correct edge-case handling. The solution covers simple lowercase RLE, encode and decode algorithms, malformed input validation, complexity, very long runs, when compression expands output, and why arbitrary digit-containing input needs delimiters or escaping.

  • medium
  • Goldman Sachs
  • Product / Decision Making
  • Product Manager

Word Compression Algorithm

Company: Goldman Sachs

Role: Product Manager

Category: Product / Decision Making

Difficulty: medium

Interview Round: Take-home Project

##### Question Design an algorithm that compresses a string by replacing consecutive repeating characters with the character followed by the repeat count (e.g., "aaabbc" → "a3b2c1"). Describe the compression and decompression procedures. Specify how you would handle edge cases such as single-character runs, very long inputs, or cases where compression makes the string longer. Provide Big-O time and space complexity analysis.

Quick Answer: Practice designing run-length encoding compression and decompression with correct edge-case handling. The solution covers simple lowercase RLE, encode and decode algorithms, malformed input validation, complexity, very long runs, when compression expands output, and why arbitrary digit-containing input needs delimiters or escaping.

Related Interview Questions

  • Lottery Coupon Winners Calculation - Goldman Sachs (medium)
|Home/Product / Decision Making/Goldman Sachs

Word Compression Algorithm

Goldman Sachs logo
Goldman Sachs
Jul 4, 2025, 8:28 PM
mediumProduct ManagerTake-home ProjectProduct / Decision Making
8
0

Algorithm Prompt: String Compression and Decompression with Run-Length Encoding

Design algorithms to compress and decompress a string using run-length encoding (RLE). For example, if the input is "aaabbc", a simple encoding can produce "a3b2c1".

Constraints & Assumptions

  • For the simple interview version, assume input characters are lowercase English letters, so character + decimal count is unambiguous.
  • Always include the count, even when it is 1.
  • Counts are positive base-10 integers.
  • If the input may contain arbitrary characters including digits, the simple character + count format is ambiguous; propose a delimiter, escaping, or length-prefixed format.
  • Very long inputs and runs should be handled without integer overflow in languages with fixed-size integers.

Clarifying Questions to Ask

  • What character set can the input contain?
  • Is the output format fixed as character + count , or can I choose a safer format?
  • Should compression return the encoded string even if it is longer than the original?
  • Do we need streaming encode/decode for very large inputs?
  • How should invalid compressed input be handled?

Part 1 - Compression

Define the compression algorithm.

What This Part Should Cover

  • Scan the input once while tracking the current character and run length.
  • Flush a run when the character changes.
  • Handle empty input and single-character runs.
  • Use an output buffer rather than repeated string concatenation.
  • Discuss whether to return the original string if compression does not help.

Part 2 - Decompression

Define the decompression algorithm.

What This Part Should Cover

  • Parse one character and then one or more digits as the count for the simple lowercase-letter format.
  • Validate malformed inputs, missing counts, and zero or negative counts.
  • Expand the character count times.
  • Mention how decoding changes if arbitrary characters require delimiters or escaping.

Part 3 - Complexity and Edge Cases

Discuss edge cases and Big-O complexity.

What This Part Should Cover

  • Empty string, single-character strings, all unique characters, all same characters, and very long runs.
  • Cases where encoded output is longer.
  • Time complexity for encode and decode.
  • Space complexity in terms of output size.
  • Ambiguity of digit-containing input under the naive format.

What a Strong Answer Covers

A strong answer gives correct encode/decode procedures, names the ambiguity in naive RLE when input can contain digits, chooses or clarifies a safe format, and analyzes complexity in terms of input and output sizes.

Follow-up Questions

  • How would you support streaming compression?
  • How would you design an unambiguous format for arbitrary Unicode input?
  • What if the count is extremely large during decoding?
  • When should a compressor return the original string?
  • How would you validate compressed input safely?
Loading comments...

Browse More Questions

More Product / Decision Making•More Goldman Sachs•More Product Manager•Goldman Sachs Product Manager•Goldman Sachs Product / Decision Making•Product Manager Product / Decision Making

Write your answer

Your first approved answer each day earns 20 XP.

Sign in to write your answer.
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.