Implement run-length encoding and decoding
Company: Databricks
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Onsite
You are given a string consisting of lowercase English letters. You need to implement run-length encoding (RLE) and its corresponding decoding.
1. **Encoding**
- Input: a string `s`, e.g., `"aaabccccd"`.
- Output: an encoded string where each maximal group of consecutive identical characters is replaced by the character followed by the count of repetitions, e.g., `"a3b1c4d1"`.
2. **Decoding**
- Input: an RLE-encoded string as described above, which is always valid (i.e., alternates between characters and positive integer counts), e.g., `"a3b1c4d1"`.
- Output: the original string, e.g., `"aaabccccd"`.
Constraints:
- `0 <= len(s) <= 10^5` for both encoding and decoding inputs.
- Counts in the encoded string fit in a 32-bit signed integer.
Tasks:
- Describe how to implement `encode(String s)` and `decode(String encoded)`.
- Explain the time and space complexity of each function.
- Consider edge cases such as empty strings and runs of length 1.
Quick Answer: This question evaluates proficiency in string manipulation and encoding algorithms, specifically run-length encoding and decoding, along with attention to edge cases such as empty inputs and single-character runs and correct handling of repetition counts.
Part 1: Implement Run-Length Encoding
Implement **run-length encoding (RLE)** for a string of lowercase English letters.
Given a string `s`, scan it left to right and replace each **maximal group of consecutive identical characters** (a "run") with that character followed immediately by the **number of times it repeats** in that run. Concatenate the encoded runs, in order, to form the result.
## What to implement
```
solution(s) -> str
```
- **Input:** a string `s` containing only lowercase English letters (`a`–`z`).
- **Output:** the run-length-encoded string.
## Rules
- Process runs from left to right. A run ends as soon as a different character appears (or the string ends).
- Each run is encoded as `<character><count>`, where `<count>` is the run length written in decimal.
- **The count is always written, even when it is `1`.** A single, isolated character becomes `<char>1`.
## Examples
- `"aaabccccd"` → `"a3b1c4d1"` (three `a`, one `b`, four `c`, one `d`)
- `"a"` → `"a1"`
- `"abc"` → `"a1b1c1"` (every character is its own run of length 1)
- `"zzzzzzzzzz"` → `"z10"`
## Edge case
- If `s` is empty, return an empty string `""`.
## Constraints
- `0 <= len(s) <= 10^5`
- `s` contains only lowercase English letters.
- Every run count fits in a 32-bit signed integer.
Constraints
- 0 <= len(s) <= 10^5
- `s` contains only lowercase English letters
- Run counts fit in a 32-bit signed integer
Examples
Input: 'aaabccccd'
Expected Output: 'a3b1c4d1'
Explanation: The runs are `aaa`, `b`, `cccc`, and `d`.
Input: ''
Expected Output: ''
Explanation: An empty string encodes to an empty string.
Input: 'a'
Expected Output: 'a1'
Explanation: A single character forms one run of length 1.
Input: 'abc'
Expected Output: 'a1b1c1'
Explanation: Each character is different, so each run has length 1.
Input: 'zzzzzzzzzz'
Expected Output: 'z10'
Explanation: A run can have a multi-digit count.
Hints
- Scan the string from left to right while tracking the current character and how many times it has repeated.
- When the character changes, append the previous character and its count to a result list, then start a new run.
Part 2: Implement Run-Length Decoding
Decode a run-length encoded string back into the original text.
You are given a valid run-length encoded string `encoded`. **Reconstruct and return the original (decoded) string.**
## Encoding format
The encoded string is a sequence of **runs**, where each run is a lowercase English letter immediately followed by a **positive integer count**. The letter is the character to repeat, and the count is how many times it appears in the original string.
- The string strictly **alternates** between a single lowercase letter and its count.
- A count may span **multiple digits** (e.g. `z10` means the letter `z` repeated 10 times — not `z` followed by `1` then `0`).
## What to implement
Implement `solution(encoded)`, which takes the encoded string and returns the decoded string.
## Example
```
Input: "a3b1c4d1"
Output: "aaabccccd"
```
This decodes as `a`×3, `b`×1, `c`×4, `d`×1, giving `aaab` + `cccc` + `d` = `aaabccccd`.
## Edge case
- If `encoded` is an **empty string**, return an empty string `""`.
## Constraints
- `0 <= len(encoded) <= 10^5`
- `encoded` is always a valid run-length encoding (every letter is followed by a positive integer count).
- Each count fits in a 32-bit signed integer.
Constraints
- 0 <= len(encoded) <= 10^5
- `encoded` is always valid
- Counts fit in a 32-bit signed integer
Examples
Input: 'a3b1c4d1'
Expected Output: 'aaabccccd'
Explanation: Expand each letter by its count: `a` 3 times, `b` 1 time, `c` 4 times, `d` 1 time.
Input: ''
Expected Output: ''
Explanation: An empty encoded string decodes to an empty string.
Input: 'a1'
Expected Output: 'a'
Explanation: A count of 1 means the character appears once.
Input: 'z10'
Expected Output: 'zzzzzzzzzz'
Explanation: Counts can have multiple digits.
Input: 'a2b3c1'
Expected Output: 'aabbbc'
Explanation: Expand `a` twice, `b` three times, and `c` once.
Hints
- Read one character at a time, then read all consecutive digits after it to form the full count.
- Appending pieces to a list and joining at the end is more efficient than repeated string concatenation.