Implement RLE and bit-packing compression
Company: Databricks
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Onsite
Quick Answer: This question evaluates understanding of simple lossless compression techniques and related competencies such as string processing and parsing for run-length encoding, bit-level operations and packing for fixed-width integer compression, and algorithmic complexity analysis.
Part 1: Run-Length Encoding (RLE)
Constraints
- mode is either 'compress' or 'decompress'.
- For compression, 0 <= len(text) <= 100000.
- For compression, text contains printable ASCII characters excluding digits '0'..'9'.
- For decompression, text is well-formed as zero or more groups of <positive integer><single non-digit character>.
- Counts fit in a standard integer type, and the fully decoded output length is at most 1000000.
Examples
Input: ('compress', 'AAABCCDDDD')
Expected Output: '3A1B2C4D'
Explanation: Runs are AAA, B, CC, and DDDD, so the encoded form is 3A1B2C4D.
Input: ('decompress', '3A1B2C4D')
Expected Output: 'AAABCCDDDD'
Explanation: Expand each count-character pair back into repeated characters.
Input: ('compress', '')
Expected Output: ''
Explanation: An empty string compresses to an empty string.
Input: ('compress', 'Z')
Expected Output: '1Z'
Explanation: A single character still includes its count.
Input: ('decompress', '12A1B2C')
Expected Output: 'AAAAAAAAAAAABCC'
Explanation: The count 12 must be parsed as a multi-digit number.
Solution
def solution(mode, text):
if mode not in ('compress', 'decompress'):
raise ValueError("mode must be 'compress' or 'decompress'")
if mode == 'compress':
if text == '':
return ''
result = []
count = 1
for i in range(1, len(text) + 1):
if i < len(text) and text[i] == text[i - 1]:
count += 1
else:
result.append(str(count))
result.append(text[i - 1])
count = 1
return ''.join(result)
# mode == 'decompress'
if text == '':
return ''
result = []
digits = []
for ch in text:
if ch.isdigit():
digits.append(ch)
else:
if not digits:
raise ValueError('Invalid encoded string')
count = int(''.join(digits))
if count <= 0:
raise ValueError('Invalid encoded string')
result.append(ch * count)
digits = []
if digits:
raise ValueError('Invalid encoded string')
return ''.join(result)
Time complexity: Compression: O(n), where n is the length of the input string. Decompression: O(m + k), where m is the encoded length and k is the decoded output length.. Space complexity: O(n) for compression output, and O(k) for decompression output..
Hints
- For compression, scan left to right while tracking the current character and its run length.
- For decompression, counts may have multiple digits, so collect consecutive digits before reading the following character.
Part 2: Bit-Packing Compression for Small Integers
Constraints
- mode is either 'pack' or 'unpack'.
- For packing, 0 <= len(data) <= 100000 and every value is in [0, 15].
- For unpacking, n is a non-negative integer and len(data) == ceil(n / 2).
- For unpacking, every packed byte is in [0, 255].
- Empty input is allowed.
Examples
Input: ('pack', [3, 15, 0, 7], None)
Expected Output: [63, 7]
Explanation: [3, 15] becomes 0x3F = 63 and [0, 7] becomes 0x07 = 7.
Input: ('unpack', [63, 7], 4)
Expected Output: [3, 15, 0, 7]
Explanation: Split each byte into its high and low 4-bit parts.
Input: ('pack', [8, 1, 5], None)
Expected Output: [129, 80]
Explanation: 8 and 1 pack to 0x81 = 129. The final 5 is stored as 0x50 = 80 because the low nibble is padded with 0.
Input: ('unpack', [129, 80], 3)
Expected Output: [8, 1, 5]
Explanation: Only the first 3 unpacked values are returned; the padded low nibble of the last byte is ignored.
Input: ('pack', [], None)
Expected Output: []
Explanation: An empty array packs to an empty byte array.
Solution
def solution(mode, data, n=None):
if mode == 'pack':
packed = []
i = 0
while i < len(data):
first = data[i]
if not 0 <= first <= 15:
raise ValueError('Values to pack must be in [0, 15]')
second = 0
if i + 1 < len(data):
second = data[i + 1]
if not 0 <= second <= 15:
raise ValueError('Values to pack must be in [0, 15]')
packed.append((first << 4) | second)
i += 2
return packed
if mode == 'unpack':
if n is None or n < 0:
raise ValueError('n must be a non-negative integer')
if len(data) != (n + 1) // 2:
raise ValueError('Packed data length does not match n')
result = []
for byte in data:
if not 0 <= byte <= 255:
raise ValueError('Packed bytes must be in [0, 255]')
if len(result) < n:
result.append((byte >> 4) & 0xF)
if len(result) < n:
result.append(byte & 0xF)
return result
raise ValueError("mode must be 'pack' or 'unpack'")
Time complexity: Packing: O(n). Unpacking: O(n), where n is the number of original integers.. Space complexity: Packing: O(ceil(n / 2)). Unpacking: O(n)..
Hints
- To pack two 4-bit values a and b into one byte, put a in the high nibble using a << 4, then combine with b using bitwise OR.
- To unpack a byte x, recover the first value with (x >> 4) & 15 and the second with x & 15. Stop once you have produced n values.