How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a medium difficulty Coding & Algorithms question, commonly asked during Onsite rounds at Databricks.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Databricks during technical interviews.

Implement RLE and bit-packing compression | Databricks Coding Question

Quick Overview

This question evaluates understanding of simple lossless compression techniques and related competencies such as string processing and parsing for run-length encoding, bit-level operations and packing for fixed-width integer compression, and algorithmic complexity analysis.

Implement RLE and bit-packing compression

Company: Databricks

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Onsite

You are asked to implement two related compression/decompression schemes: **Run-Length Encoding (RLE)** and **bit-packing**. --- ## Part 1 — Run-Length Encoding (RLE) Implement RLE compression and decompression for a string consisting of characters in the ASCII range. ### RLE compression - **Input**: a string, for example: `"AAABCCDDDD"`. - **Output**: an encoded string that replaces consecutive runs of the same character with a count followed by the character. - For example, the above input could be encoded as: `"3A1B2C4D"`. Requirements: - Implement a function/method that takes the original string and returns the compressed string. - If a character appears only once, its count should still be included (e.g., `"1B"`). - Think about edge cases, such as: - Empty string. - String with one character. - Very long runs (counts larger than 9, larger than 255, etc.). - You may assume the counts fit in a standard integer type. ### RLE decompression - Implement a function/method that takes a valid RLE-encoded string (e.g., `"3A1B2C4D"`) and reconstructs the original string (e.g., `"AAABCCDDDD"`). - You may assume the input is well-formed (digits representing a positive integer count immediately followed by a single character). Describe the time and space complexity of your compression and decompression algorithms. --- ## Part 2 — Bit-Packing Compression for Small Integers You are given an array of non-negative integers, each guaranteed to be in the range `[0, 15]` (i.e., each value fits in 4 bits). ### Bit-packing compression - **Input**: an array of integers, for example: `[3, 15, 0, 7]`. - **Output**: a byte array that packs two 4-bit values into each byte: - The first value in the pair occupies the **high 4 bits** of the byte. - The second value occupies the **low 4 bits** of the byte. - For example, for `[3, 15]`: - `3` in binary (4 bits) is `0011`. - `15` in binary (4 bits) is `1111`. - The packed byte is `00111111` (0x3F). - If the input array has an **odd** number of elements, the last byte should have the last value in the high 4 bits and the low 4 bits set to zero. Implement a function/method that: - Takes the array of integers `[0..15]` as input. - Returns a byte array representing the packed data. ### Bit-packing decompression - Implement the inverse function that takes a packed byte array and the **original number of integers** `n`, and reconstructs the original array of integers. - You may assume `n` is provided and matches the original length. Again, describe the time and space complexity of your compression and decompression algorithms. You do **not** need to provide code in a specific language for this prompt, but your explanation should be detailed enough that code could be written directly from it.

Quick Answer: This question evaluates understanding of simple lossless compression techniques and related competencies such as string processing and parsing for run-length encoding, bit-level operations and packing for fixed-width integer compression, and algorithmic complexity analysis.

Part 1: Run-Length Encoding (RLE)

Implement a function that performs either RLE compression or RLE decompression. Use the function signature solution(mode, text): - If mode == 'compress', text is a raw string and you must replace each maximal run of identical consecutive characters with '<count><character>'. For example, 'AAABCCDDDD' becomes '3A1B2C4D'. Counts of 1 must still be written. - If mode == 'decompress', text is a valid encoded string in that same format, and you must reconstruct the original string. To keep decoding unambiguous, the raw input for compression in this problem will not contain digit characters '0' through '9'. Empty strings are allowed.

Constraints

mode is either 'compress' or 'decompress'.
For compression, 0 <= len(text) <= 100000.
For compression, text contains printable ASCII characters excluding digits '0'..'9'.
For decompression, text is well-formed as zero or more groups of <positive integer><single non-digit character>.
Counts fit in a standard integer type, and the fully decoded output length is at most 1000000.

Examples

Input: ('compress', 'AAABCCDDDD')

Expected Output: '3A1B2C4D'

Explanation: Runs are AAA, B, CC, and DDDD, so the encoded form is 3A1B2C4D.

Input: ('decompress', '3A1B2C4D')

Expected Output: 'AAABCCDDDD'

Explanation: Expand each count-character pair back into repeated characters.

Input: ('compress', '')

Expected Output: ''

Explanation: An empty string compresses to an empty string.

Input: ('compress', 'Z')

Expected Output: '1Z'

Explanation: A single character still includes its count.

Input: ('decompress', '12A1B2C')

Expected Output: 'AAAAAAAAAAAABCC'

Explanation: The count 12 must be parsed as a multi-digit number.

Solution

def solution(mode, text):
    if mode not in ('compress', 'decompress'):
        raise ValueError("mode must be 'compress' or 'decompress'")

    if mode == 'compress':
        if text == '':
            return ''

        result = []
        count = 1

        for i in range(1, len(text) + 1):
            if i < len(text) and text[i] == text[i - 1]:
                count += 1
            else:
                result.append(str(count))
                result.append(text[i - 1])
                count = 1

        return ''.join(result)

    # mode == 'decompress'
    if text == '':
        return ''

    result = []
    digits = []

    for ch in text:
        if ch.isdigit():
            digits.append(ch)
        else:
            if not digits:
                raise ValueError('Invalid encoded string')
            count = int(''.join(digits))
            if count <= 0:
                raise ValueError('Invalid encoded string')
            result.append(ch * count)
            digits = []

    if digits:
        raise ValueError('Invalid encoded string')

    return ''.join(result)

Time complexity: Compression: O(n), where n is the length of the input string. Decompression: O(m + k), where m is the encoded length and k is the decoded output length.. Space complexity: O(n) for compression output, and O(k) for decompression output..

Hints

For compression, scan left to right while tracking the current character and its run length.
For decompression, counts may have multiple digits, so collect consecutive digits before reading the following character.

Part 2: Bit-Packing Compression for Small Integers

Implement a function that either packs or unpacks 4-bit integers. Use the function signature solution(mode, data, n=None): - If mode == 'pack', data is a list of non-negative integers in the range [0, 15]. Pack two values into each byte: the first value goes in the high 4 bits and the second value goes in the low 4 bits. - If mode == 'unpack', data is a packed byte array represented as a list of integers in [0, 255], and n is the original number of 4-bit integers. Reconstruct and return the original list. If there is an odd number of values during packing, store the final value in the high 4 bits of the last byte and set the low 4 bits to 0. Because this platform uses Python literals, represent the byte array as a list of integers rather than a bytes object.

Constraints

mode is either 'pack' or 'unpack'.
For packing, 0 <= len(data) <= 100000 and every value is in [0, 15].
For unpacking, n is a non-negative integer and len(data) == ceil(n / 2).
For unpacking, every packed byte is in [0, 255].
Empty input is allowed.

Examples

Input: ('pack', [3, 15, 0, 7], None)

Expected Output: [63, 7]

Explanation: [3, 15] becomes 0x3F = 63 and [0, 7] becomes 0x07 = 7.

Input: ('unpack', [63, 7], 4)

Expected Output: [3, 15, 0, 7]

Explanation: Split each byte into its high and low 4-bit parts.

Input: ('pack', [8, 1, 5], None)

Expected Output: [129, 80]

Explanation: 8 and 1 pack to 0x81 = 129. The final 5 is stored as 0x50 = 80 because the low nibble is padded with 0.

Input: ('unpack', [129, 80], 3)

Expected Output: [8, 1, 5]

Explanation: Only the first 3 unpacked values are returned; the padded low nibble of the last byte is ignored.

Input: ('pack', [], None)

Expected Output: []

Explanation: An empty array packs to an empty byte array.

Solution

def solution(mode, data, n=None):
    if mode == 'pack':
        packed = []
        i = 0
        while i < len(data):
            first = data[i]
            if not 0 <= first <= 15:
                raise ValueError('Values to pack must be in [0, 15]')

            second = 0
            if i + 1 < len(data):
                second = data[i + 1]
                if not 0 <= second <= 15:
                    raise ValueError('Values to pack must be in [0, 15]')

            packed.append((first << 4) | second)
            i += 2

        return packed

    if mode == 'unpack':
        if n is None or n < 0:
            raise ValueError('n must be a non-negative integer')
        if len(data) != (n + 1) // 2:
            raise ValueError('Packed data length does not match n')

        result = []
        for byte in data:
            if not 0 <= byte <= 255:
                raise ValueError('Packed bytes must be in [0, 255]')

            if len(result) < n:
                result.append((byte >> 4) & 0xF)
            if len(result) < n:
                result.append(byte & 0xF)

        return result

    raise ValueError("mode must be 'pack' or 'unpack'")

Time complexity: Packing: O(n). Unpacking: O(n), where n is the number of original integers.. Space complexity: Packing: O(ceil(n / 2)). Unpacking: O(n)..

Hints

To pack two 4-bit values a and b into one byte, put a in the high nibble using a << 4, then combine with b using bitwise OR.
To unpack a byte x, recover the first value with (x >> 4) & 15 and the second with x & 15. Stop once you have produced n values.

Quick Overview

Implement RLE and bit-packing compression

Company: Databricks

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Onsite

Part 1: Run-Length Encoding (RLE)

Constraints

mode is either 'compress' or 'decompress'.
For compression, 0 <= len(text) <= 100000.
For compression, text contains printable ASCII characters excluding digits '0'..'9'.
For decompression, text is well-formed as zero or more groups of <positive integer><single non-digit character>.
Counts fit in a standard integer type, and the fully decoded output length is at most 1000000.

Examples

Input: ('compress', 'AAABCCDDDD')

Expected Output: '3A1B2C4D'

Explanation: Runs are AAA, B, CC, and DDDD, so the encoded form is 3A1B2C4D.

Input: ('decompress', '3A1B2C4D')

Expected Output: 'AAABCCDDDD'

Explanation: Expand each count-character pair back into repeated characters.

Input: ('compress', '')

Expected Output: ''

Explanation: An empty string compresses to an empty string.

Input: ('compress', 'Z')

Expected Output: '1Z'

Explanation: A single character still includes its count.

Input: ('decompress', '12A1B2C')

Expected Output: 'AAAAAAAAAAAABCC'

Explanation: The count 12 must be parsed as a multi-digit number.

Solution

def solution(mode, text):
    if mode not in ('compress', 'decompress'):
        raise ValueError("mode must be 'compress' or 'decompress'")

    if mode == 'compress':
        if text == '':
            return ''

        result = []
        count = 1

        for i in range(1, len(text) + 1):
            if i < len(text) and text[i] == text[i - 1]:
                count += 1
            else:
                result.append(str(count))
                result.append(text[i - 1])
                count = 1

        return ''.join(result)

    # mode == 'decompress'
    if text == '':
        return ''

    result = []
    digits = []

    for ch in text:
        if ch.isdigit():
            digits.append(ch)
        else:
            if not digits:
                raise ValueError('Invalid encoded string')
            count = int(''.join(digits))
            if count <= 0:
                raise ValueError('Invalid encoded string')
            result.append(ch * count)
            digits = []

    if digits:
        raise ValueError('Invalid encoded string')

    return ''.join(result)

Hints

For compression, scan left to right while tracking the current character and its run length.
For decompression, counts may have multiple digits, so collect consecutive digits before reading the following character.

Part 2: Bit-Packing Compression for Small Integers

Constraints

mode is either 'pack' or 'unpack'.
For packing, 0 <= len(data) <= 100000 and every value is in [0, 15].
For unpacking, n is a non-negative integer and len(data) == ceil(n / 2).
For unpacking, every packed byte is in [0, 255].
Empty input is allowed.

Examples

Input: ('pack', [3, 15, 0, 7], None)

Expected Output: [63, 7]

Explanation: [3, 15] becomes 0x3F = 63 and [0, 7] becomes 0x07 = 7.

Input: ('unpack', [63, 7], 4)

Expected Output: [3, 15, 0, 7]

Explanation: Split each byte into its high and low 4-bit parts.

Input: ('pack', [8, 1, 5], None)

Expected Output: [129, 80]

Explanation: 8 and 1 pack to 0x81 = 129. The final 5 is stored as 0x50 = 80 because the low nibble is padded with 0.

Input: ('unpack', [129, 80], 3)

Expected Output: [8, 1, 5]

Explanation: Only the first 3 unpacked values are returned; the padded low nibble of the last byte is ignored.

Input: ('pack', [], None)

Expected Output: []

Explanation: An empty array packs to an empty byte array.

Solution

def solution(mode, data, n=None):
    if mode == 'pack':
        packed = []
        i = 0
        while i < len(data):
            first = data[i]
            if not 0 <= first <= 15:
                raise ValueError('Values to pack must be in [0, 15]')

            second = 0
            if i + 1 < len(data):
                second = data[i + 1]
                if not 0 <= second <= 15:
                    raise ValueError('Values to pack must be in [0, 15]')

            packed.append((first << 4) | second)
            i += 2

        return packed

    if mode == 'unpack':
        if n is None or n < 0:
            raise ValueError('n must be a non-negative integer')
        if len(data) != (n + 1) // 2:
            raise ValueError('Packed data length does not match n')

        result = []
        for byte in data:
            if not 0 <= byte <= 255:
                raise ValueError('Packed bytes must be in [0, 255]')

            if len(result) < n:
                result.append((byte >> 4) & 0xF)
            if len(result) < n:
                result.append(byte & 0xF)

        return result

    raise ValueError("mode must be 'pack' or 'unpack'")

Time complexity: Packing: O(n). Unpacking: O(n), where n is the number of original integers.. Space complexity: Packing: O(ceil(n / 2)). Unpacking: O(n)..

Hints

To pack two 4-bit values a and b into one byte, put a in the high nibble using a << 4, then combine with b using bitwise OR.
To unpack a byte x, recover the first value with (x >> 4) & 15 and the second with x & 15. Stop once you have produced n values.