Implement run-length encoding and decoding
Company: Databricks
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Onsite
Quick Answer: This question evaluates proficiency in string manipulation and encoding algorithms, specifically run-length encoding and decoding, along with attention to edge cases such as empty inputs and single-character runs and correct handling of repetition counts.
Part 1: Implement Run-Length Encoding
Constraints
- 0 <= len(s) <= 10^5
- `s` contains only lowercase English letters
- Run counts fit in a 32-bit signed integer
Examples
Input: 'aaabccccd'
Expected Output: 'a3b1c4d1'
Explanation: The runs are `aaa`, `b`, `cccc`, and `d`.
Input: ''
Expected Output: ''
Explanation: An empty string encodes to an empty string.
Input: 'a'
Expected Output: 'a1'
Explanation: A single character forms one run of length 1.
Input: 'abc'
Expected Output: 'a1b1c1'
Explanation: Each character is different, so each run has length 1.
Input: 'zzzzzzzzzz'
Expected Output: 'z10'
Explanation: A run can have a multi-digit count.
Solution
def solution(s):
if not s:
return ''
parts = []
current = s[0]
count = 1
for ch in s[1:]:
if ch == current:
count += 1
else:
parts.append(current)
parts.append(str(count))
current = ch
count = 1
parts.append(current)
parts.append(str(count))
return ''.join(parts)Time complexity: O(n), where n is the length of `s`.. Space complexity: O(n) for the output being built..
Hints
- Scan the string from left to right while tracking the current character and how many times it has repeated.
- When the character changes, append the previous character and its count to a result list, then start a new run.
Part 2: Implement Run-Length Decoding
Constraints
- 0 <= len(encoded) <= 10^5
- `encoded` is always valid
- Counts fit in a 32-bit signed integer
Examples
Input: 'a3b1c4d1'
Expected Output: 'aaabccccd'
Explanation: Expand each letter by its count: `a` 3 times, `b` 1 time, `c` 4 times, `d` 1 time.
Input: ''
Expected Output: ''
Explanation: An empty encoded string decodes to an empty string.
Input: 'a1'
Expected Output: 'a'
Explanation: A count of 1 means the character appears once.
Input: 'z10'
Expected Output: 'zzzzzzzzzz'
Explanation: Counts can have multiple digits.
Input: 'a2b3c1'
Expected Output: 'aabbbc'
Explanation: Expand `a` twice, `b` three times, and `c` once.
Solution
def solution(encoded):
if not encoded:
return ''
parts = []
i = 0
n = len(encoded)
while i < n:
ch = encoded[i]
i += 1
j = i
while j < n and encoded[j].isdigit():
j += 1
count = int(encoded[i:j])
parts.append(ch * count)
i = j
return ''.join(parts)Time complexity: O(n + L), where n is the length of `encoded` and L is the length of the decoded output.. Space complexity: O(L) for the decoded output..
Hints
- Read one character at a time, then read all consecutive digits after it to form the full count.
- Appending pieces to a list and joining at the end is more efficient than repeated string concatenation.