How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a hard difficulty Coding & Algorithms question, commonly asked during Onsite rounds at Databricks.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Databricks during technical interviews.

Implement streaming RLE and bit-packed codec

Company: Databricks

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: hard

Interview Round: Onsite

You are implementing a simple compression scheme for sequences of 32‑bit signed integers. The codec should support **two encoding strategies**: 1. **Run‑Length Encoding (RLE)** for long runs of equal values. 2. **Bit‑Packed Encoding (BP)** for blocks of values that can be represented with a small, uniform bit‑width. The codec is **streaming**: values arrive one by one, and the encoder should buffer them into blocks and choose an encoding strategy per block. ### Compressed representation Design a compressed representation made of **blocks**. For concreteness, define: - **RLE block** - Represents a run of a single repeated integer value. - Fields: - `type = 'R'` - `value` (int32) - `count` (int, number of repetitions) - Only use RLE blocks when `count >= RLE_MIN_RUN` (e.g., `RLE_MIN_RUN = 3`). Shorter runs should not be encoded as RLE. - **Bit‑packed block** - Represents a sequence of `count` possibly different integers, all of which fit into `bitWidth` bits in two's‑complement representation. - Fields: - `type = 'B'` - `bitWidth` (1–32) - `count` (number of values) - A packed payload containing exactly `count` values, each stored using `bitWidth` bits. - You may choose a **maximum block size** `MAX_BP_BLOCK` (e.g., 128 values) for simplicity. You can decide the in‑memory representation of a bit‑packed payload (e.g., array of 32‑bit integers where bits are tightly packed), as long as the decoder can reconstruct the original sequence exactly. ### Encoder Implement an `Encoder` class with the following behavior: - It receives input values **one at a time** via a method like: ```text void add(int value) ``` - Internally, the encoder may maintain a buffer of recent values to decide whether to form an RLE block or a bit‑packed block. - At appropriate times (e.g., when a block is full, when the encoding strategy should change, or when `flush()` is called), it should **emit blocks**. - Provide a method: ```text List<Block> flush() ``` that finalizes the stream, closes any open block, and returns the list of compressed blocks. **Encoding strategy constraints:** - For a maximal run of the same value with length `L`: - If `L >= RLE_MIN_RUN`, you should encode it as a single RLE block. - Otherwise, those values should be part of a bit‑packed block. - For bit‑packed blocks: - You may group consecutive non‑RLE values into blocks up to size `MAX_BP_BLOCK`. - Choose `bitWidth` for a block as the minimum number of bits needed to represent **all** its values. - You do **not** need to prove global optimality of the compression, but your encoder must consistently follow the above rules. The encoder must correctly handle: - Negative values and the full range of 32‑bit signed integers (`Integer.MIN_VALUE` to `Integer.MAX_VALUE`). - Transitions between RLE and bit‑packed segments in the stream. ### Decoder Implement a corresponding `Decoder` that reconstructs the original integer sequence from a list of blocks. - Its constructor receives the list of blocks produced by the encoder: ```text class Decoder implements Iterator<Integer> { Decoder(List<Block> blocks) { ... } boolean hasNext(); int next(); } ``` - `hasNext()` / `next()` should expose the **original sequence of integers** in order, exactly as they were passed to `Encoder.add(...)`. - The decoder must correctly iterate across both RLE and bit‑packed blocks. ### Tasks 1. Specify the exact in‑memory structure you will use for `Block` and the bit‑packed payload. 2. Implement `Encoder.add(value)` and `Encoder.flush()` to produce a valid block sequence. 3. Implement `Decoder` as an iterator over the decompressed values. 4. Write unit tests for cases including: - Simple sequences without repeats (forces bit‑packing). - Long runs of a single value (forces RLE). - Alternating patterns (switching between RLE and BP). - Values near `Integer.MAX_VALUE`, `Integer.MIN_VALUE`, and negative numbers. - Empty input sequence. Assume you are working in an object‑oriented language (e.g., Java, C++, or similar) and focus on clean class design and correctness of the codec pair.

Quick Answer: This question evaluates understanding of data compression algorithms, bit-level manipulation, two's‑complement integer representation, streaming algorithm design, and encoder/decoder state management for 32-bit signed integers.

Part 1: Build a canonical block representation

Build a single **compression block** in a canonical Python in-memory format. Implement: ```python def solution(block_type, data): ``` `block_type` is either `'R'` (run-length encoded) or `'B'` (bit-packed). The shape of `data` and the dictionary you return depend on `block_type`. ## RLE block (`block_type == 'R'`) `data` is a `(value, count)` tuple. Return the value and count wrapped in the canonical RLE shape — no computation is needed: ```python {'type': 'R', 'value': value, 'count': count} ``` - `count >= 1`. ## Bit-packed block (`block_type == 'B'`) `data` is a list of signed 32-bit integers. Return: ```python {'type': 'B', 'bitWidth': w, 'count': n, 'words': words} ``` where `n` is the number of input values and `words` is a list of **unsigned 32-bit** integers. **Empty input.** If `data` is empty, return exactly: ```python {'type': 'B', 'bitWidth': 0, 'count': 0, 'words': []} ``` **Otherwise**, build the block in two steps: 1. **Choose the bit width `w`.** Compute, for each value, the minimum number of bits needed to represent it in **signed two's-complement** form (the encoding must include a sign bit, so non-negative values keep a leading `0`). The block's `bitWidth` is the **maximum** such width over all values, so every value fits in `w` bits. Notes: - The value `0` needs **1** bit. - A list containing both `-2147483648` and `2147483647` needs `32` bits. 2. **Pack the values tightly into 32-bit words.** Encode each value as its low `w` bits in two's-complement (for a negative value `v`, this is `v & ((1 << w) - 1)`), then lay these `w`-bit fields end to end with no gaps, in **input order**: - The first value occupies the **lowest** `w` bits of `words[0]`, the second value starts immediately after it (at bit offset `w`), and so on. Within a word, earlier values occupy lower bits (**little-endian** bit cursor). - When a value's field crosses a 32-bit boundary, its low bits finish the current word and its remaining high bits continue at the bottom of the next word. - Each completed 32-bit word is emitted as an unsigned value in `[0, 2^32 − 1]`. After all values are packed, flush any partially filled final word as well. **Example.** `('B', [0, 1, -1])` → `bitWidth` 2, encodings `00`, `01`, `11`, packed little-endian as `11 01 00` (binary `110100`) → `{'type': 'B', 'bitWidth': 2, 'count': 3, 'words': [52]}`. ## Constraints - `block_type` is either `'R'` or `'B'`. - All integers are in the signed 32-bit range `[-2147483648, 2147483647]`. - For an RLE block, `count >= 1`. - For a bit-packed block, `0 <= len(data) <= 10^4`.

Constraints

block_type is either 'R' or 'B'.
All integers are in the signed 32-bit range [-2147483648, 2147483647].
For an RLE block, count >= 1.
For a bit-packed block, 0 <= len(data) <= 10^4.

Examples

Input: ('R', (5, 4))

Expected Output: {'type': 'R', 'value': 5, 'count': 4}

Explanation: A run-length block is stored directly.

Input: ('B', [0, 1, -1])

Expected Output: {'type': 'B', 'bitWidth': 2, 'count': 3, 'words': [52]}

Explanation: The minimum common width is 2 bits: 0 -> 00, 1 -> 01, -1 -> 11. Packed word is 0b110100 = 52.

Input: ('B', [-2147483648, 2147483647])

Expected Output: {'type': 'B', 'bitWidth': 32, 'count': 2, 'words': [2147483648, 2147483647]}

Explanation: The full 32-bit signed range requires width 32, so each value occupies one full word.

Input: ('B', [])

Expected Output: {'type': 'B', 'bitWidth': 0, 'count': 0, 'words': []}

Explanation: Edge case: empty bit-packed input.

Input: ('B', [7, 7, 7, 7, 7, 7, 7, 7, 7])

Expected Output: {'type': 'B', 'bitWidth': 4, 'count': 9, 'words': [2004318071, 7]}

Explanation: Nine 4-bit values need 36 bits, so the payload spills into a second word.

Solution

def solution(block_type, data):
    def min_bits_signed(x):
        if x >= 0:
            return 1 if x == 0 else x.bit_length() + 1
        return (~x).bit_length() + 1

    def pack(values, w):
        words = []
        current = 0
        used = 0
        mask = (1 << w) - 1
        for v in values:
            u = v & mask
            remaining = w
            while remaining > 0:
                space = 32 - used
                take = remaining if remaining < space else space
                part = u & ((1 << take) - 1)
                current |= part << used
                used += take
                u >>= take
                remaining -= take
                if used == 32:
                    words.append(current & 0xFFFFFFFF)
                    current = 0
                    used = 0
        if used > 0:
            words.append(current & 0xFFFFFFFF)
        return words

    if block_type == 'R':
        value, count = data
        return {'type': 'R', 'value': value, 'count': count}

    values = list(data)
    if not values:
        return {'type': 'B', 'bitWidth': 0, 'count': 0, 'words': []}

    bit_width = max(min_bits_signed(v) for v in values)
    return {'type': 'B', 'bitWidth': bit_width, 'count': len(values), 'words': pack(values, bit_width)}

Explanation

The function builds one canonical compression block. **RLE path (`block_type == 'R'`).** `data` is a `(value, count)` tuple; the code just wraps it into `{'type': 'R', 'value': value, 'count': count}`. Nothing to compute. **Bit-packed path (`block_type == 'B'`).** Empty input short-circuits to the prescribed `bitWidth: 0` block. Otherwise it works in two phases: 1. **Choose the bit width.** `min_bits_signed(x)` returns the minimum *signed* two's-complement width: - `x == 0` → `1` - `x > 0` → `x.bit_length() + 1` (data bits plus a sign bit so the top bit stays `0`) - `x < 0` → `(~x).bit_length() + 1` (since `~x == -x-1`, this is the width needed for the magnitude plus the sign bit) `bit_width = max(min_bits_signed(v) for v in values)` makes one width fit every value, so e.g. `[-2147483648, 2147483647]` needs `32`. 2. **Pack tightly into 32-bit words.** `pack` keeps an accumulator `current` and a counter `used` (bits filled in the current word). Each value is first reduced to its low `w` bits via `u = v & mask` — this is exactly the two's-complement encoding for negatives. It then streams those `w` bits out: `take = min(remaining, 32 - used)` bits are OR'd in at offset `used` (`current |= part << used`), and when `used` hits 32 the word is flushed (`& 0xFFFFFFFF`) and the accumulator resets. A value that crosses a word boundary simply continues in the next word, matching the canonical "first value in the lowest bits of `words[0]`" rule. Any partial final word is flushed at the end. This is correct because masking gives canonical signed encodings, and the little-endian bit cursor preserves order and boundary-crossing exactly as specified.

Time complexity: O(n). Space complexity: O(n).

Hints

A signed w-bit two's-complement integer must lie in the range [-2^(w-1), 2^(w-1)-1].
While packing, track the current 32-bit word and the bit offset already used inside it.

Part 2: Encode an integer stream into RLE and bit-packed blocks

Implement the **encoder** for a simple integer codec that compresses a sequence of signed integers into a list of *blocks*. Implement the function: ```python def solution(values, rle_min_run=3, max_bp_block=128): ``` - `values`: a list of signed 32-bit integers (may be empty). - `rle_min_run`: minimum run length that qualifies for run-length encoding. - `max_bp_block`: maximum number of values per bit-packed block. Return the full list of encoded blocks (in order). For an empty input, return an empty list. ## Block formats There are two kinds of blocks, each a dictionary: - **RLE block** — `{'type': 'R', 'value': value, 'count': count}` - **Bit-packed block** — `{'type': 'B', 'bitWidth': w, 'count': n, 'words': words}` ## Encoding rules 1. **Scan in order.** Process `values` left to right in a single pass. 2. **Detect maximal runs.** A *run* is a maximal stretch of consecutive **equal** values. For a run of length `L`: - If `L >= rle_min_run`, emit exactly **one** RLE block `{'type': 'R', 'value': <the repeated value>, 'count': L}` for that run. - Otherwise (`L < rle_min_run`), the values of that run are **non-RLE** and must be carried into bit-packed blocks. 3. **Order is preserved.** Non-RLE values are emitted in their original order. Whenever an RLE block is produced, all pending non-RLE values accumulated so far must be flushed (as bit-packed blocks) **before** that RLE block is appended. 4. **Group non-RLE values into bit-packed blocks.** Treat the running stream of non-RLE values as one continuous sequence (it may span across multiple short runs and across different values). Cut it into chunks of **at most `max_bp_block`** values each, in order; each chunk becomes one bit-packed block. A trailing chunk smaller than `max_bp_block` is allowed at a flush boundary (i.e. before an RLE block or at the end of input). 5. **Choose the bit width.** Each bit-packed block uses the **minimum signed two's-complement bit width** needed to represent **every** value in that block. Concretely, the block's `bitWidth` `w` is the maximum over its values of the minimum signed width of each value, where: - `0` needs **1** bit, - a positive value `x` needs `x.bit_length() + 1` bits (an extra bit for the sign), - a negative value `x` needs `(~x).bit_length() + 1` bits. (So `-1` → 1, `7` → 4, `2147483647` → 32, and `-2147483648` → 32.) 6. **Pack values tightly.** Within a bit-packed block, each value is reduced to its low `w` bits via its two's-complement representation (`value & ((1 << w) - 1)`) and packed **tightly, LSB-first**, into a list of **unsigned 32-bit** words (`words`): - The first value occupies the lowest `w` bits of `words[0]`. - The next value follows **immediately** in the next `w` bits, and so on. - When a value straddles a 32-bit word boundary, its low bits go into the current word and the remaining high bits continue at the start of the next word. - After all values are written, a final partial word (if any) is appended. Every entry in `words` is an unsigned 32-bit integer (`0 .. 2^32 - 1`). `count` is the number of values packed into the block (`n`). Return the complete ordered list of `R` and `B` blocks. ## Examples - `solution([], 3, 128)` → `[]` - `solution([1, 2, 3], 3, 128)` → `[{'type': 'B', 'bitWidth': 3, 'count': 3, 'words': [209]}]` - `solution([7, 7, 7, 7], 3, 128)` → `[{'type': 'R', 'value': 7, 'count': 4}]` - `solution([5, 5, 5, 1, 2, 2, 3, 3, 3, 4], 3, 128)` → `[{'type': 'R', 'value': 5, 'count': 3}, {'type': 'B', 'bitWidth': 3, 'count': 3, 'words': [145]}, {'type': 'R', 'value': 3, 'count': 3}, {'type': 'B', 'bitWidth': 4, 'count': 1, 'words': [4]}]` - `solution([0, 0, 1, 1, 2, 2], 3, 4)` → `[{'type': 'B', 'bitWidth': 2, 'count': 4, 'words': [80]}, {'type': 'B', 'bitWidth': 3, 'count': 2, 'words': [18]}]` (No run reaches length 3, so all six values are non-RLE. They form one continuous stream that is cut into chunks of at most `max_bp_block = 4`: the first 4 values `[0, 0, 1, 1]` and then `[2, 2]`.) ## Constraints - `0 <= len(values) <= 10^5` - `1 <= rle_min_run <= 10` - `1 <= max_bp_block <= 128` - All input values are in the signed 32-bit range `[-2147483648, 2147483647]`.

Constraints

0 <= len(values) <= 10^5
1 <= rle_min_run <= 10
1 <= max_bp_block <= 128
All input values are in the signed 32-bit range [-2147483648, 2147483647].

Examples

Input: ([], 3, 128)

Expected Output: []

Explanation: Edge case: empty input produces no blocks.

Input: ([1, 2, 3], 3, 128)

Expected Output: [{'type': 'B', 'bitWidth': 3, 'count': 3, 'words': [209]}]

Explanation: No run is long enough for RLE, so the whole sequence becomes one BP block.

Input: ([7, 7, 7, 7], 3, 128)

Expected Output: [{'type': 'R', 'value': 7, 'count': 4}]

Explanation: A maximal run of length 4 qualifies for RLE.

Input: ([5, 5, 5, 1, 2, 2, 3, 3, 3, 4], 3, 128)

Expected Output: [{'type': 'R', 'value': 5, 'count': 3}, {'type': 'B', 'bitWidth': 3, 'count': 3, 'words': [145]}, {'type': 'R', 'value': 3, 'count': 3}, {'type': 'B', 'bitWidth': 4, 'count': 1, 'words': [4]}]

Explanation: The 5s and 3s become RLE blocks; the short middle segment and final 4 become BP blocks.

Input: ([0, 0, 1, 1, 2, 2], 3, 4)

Expected Output: [{'type': 'B', 'bitWidth': 2, 'count': 4, 'words': [80]}, {'type': 'B', 'bitWidth': 3, 'count': 2, 'words': [18]}]

Explanation: No run reaches length 3, so everything is BP. max_bp_block = 4 forces the data to split into two BP blocks.

Solution

def solution(values, rle_min_run=3, max_bp_block=128):
    from collections import deque

    def min_bits_signed(x):
        if x >= 0:
            return 1 if x == 0 else x.bit_length() + 1
        return (~x).bit_length() + 1

    def pack(vals, w):
        words = []
        current = 0
        used = 0
        mask = (1 << w) - 1
        for v in vals:
            u = v & mask
            remaining = w
            while remaining > 0:
                space = 32 - used
                take = remaining if remaining < space else space
                part = u & ((1 << take) - 1)
                current |= part << used
                used += take
                u >>= take
                remaining -= take
                if used == 32:
                    words.append(current & 0xFFFFFFFF)
                    current = 0
                    used = 0
        if used > 0:
            words.append(current & 0xFFFFFFFF)
        return words

    def make_bp(chunk):
        bit_width = max(min_bits_signed(v) for v in chunk)
        return {'type': 'B', 'bitWidth': bit_width, 'count': len(chunk), 'words': pack(chunk, bit_width)}

    blocks = []
    buffer = deque()

    def flush_full_chunks():
        while len(buffer) >= max_bp_block:
            chunk = [buffer.popleft() for _ in range(max_bp_block)]
            blocks.append(make_bp(chunk))

    def flush_all():
        while buffer:
            size = min(len(buffer), max_bp_block)
            chunk = [buffer.popleft() for _ in range(size)]
            blocks.append(make_bp(chunk))

    i = 0
    n = len(values)
    while i < n:
        j = i + 1
        while j < n and values[j] == values[i]:
            j += 1
        run_len = j - i

        if run_len >= rle_min_run:
            flush_all()
            blocks.append({'type': 'R', 'value': values[i], 'count': run_len})
        else:
            for k in range(i, j):
                buffer.append(values[k])
            flush_full_chunks()

        i = j

    flush_all()
    return blocks

Explanation

The encoder makes a **single left-to-right pass**, greedily separating long equal-value runs (RLE) from everything else (bit-packed), while preserving original order. **Run detection.** The outer `while` finds each maximal run: from index `i`, it advances `j` while `values[j] == values[i]`, giving `run_len = j - i`. Each element is visited once because `i` jumps straight to `j`. **Dispatch per run.** - If `run_len >= rle_min_run`, the run is RLE-worthy. We first `flush_all()` the pending bit-pack `buffer` (order must be preserved), then append `{'type':'R', 'value', 'count':run_len}`. - Otherwise the (short) run's values are pushed into a `deque` `buffer`, and `flush_full_chunks()` emits any complete `max_bp_block`-sized chunk as a 'B' block. **Building a 'B' block** (`make_bp`): `bitWidth` is the **max** of `min_bits_signed(v)` over the chunk — the minimum signed two's-complement width. `min_bits_signed` returns 1 for `0`, `bit_length()+1` for positives (room for the sign bit), and `(~x).bit_length()+1` for negatives — so `-1`→1, `7`→4, `±2^31`→32. **Tight packing** (`pack`): each value's low `w` bits (`v & mask`, two's-complement) are written LSB-first into a rolling 32-bit `current`. When a value straddles a word boundary, the inner loop splits it: it writes `take` bits, flushes `current` to `words` at 32 bits, and continues with the remainder. A final partial word is flushed at the end. After the loop, a closing `flush_all()` drains the remaining buffer. The greedy choice is correct because RLE eligibility depends only on a run's own length, and non-RLE values are emitted in order in fixed-size chunks.

Time complexity: O(n). Space complexity: O(max_bp_block + output_size).

Hints

Think in terms of maximal runs first. A long enough run becomes one R block; a short run is buffered into BP data.
Flush the buffered non-RLE values before emitting an R block, and also whenever the BP buffer reaches max_bp_block.

Part 3: Decode RLE and bit-packed blocks

Decompress a sequence of encoded blocks back into the original list of integers. Implement: ```python def solution(blocks): ... ``` ## Input `blocks` is a list of block objects, processed **in order**. There are two block types, each given as a dictionary. **RLE (run-length) block** — `type == 'R'`: | Key | Meaning | |-----|---------| | `'type'` | the string `'R'` | | `'value'` | the integer that repeats | | `'count'` | how many times `value` repeats (`count >= 0`) | It decodes to `count` copies of `value`. **Bit-packed block** — `type == 'B'`: | Key | Meaning | |-----|---------| | `'type'` | the string `'B'` | | `'bitWidth'` | `w`, the number of bits per value | | `'count'` | `n`, how many values this block encodes (`n >= 0`) | | `'words'` | a list of **unsigned 32-bit** integers holding the packed bits | ## Output Return a single flat list of integers: the decoded values of every block, **concatenated in the order the blocks appear**. An empty `blocks` list returns an empty list. ## Decoding a bit-packed block The `n` values are packed **tightly** (no padding) into the `words` stream, **least-significant bit first**: - The **first** value occupies the lowest `w` bits of `words[0]`. - Each subsequent value starts in the bit position **immediately after** the previous one. - Within a word, lower bit positions hold earlier values. When a value would extend past bit 31 of the current word, it **continues into the next word**: the low bits come from the top of the current word and the remaining high bits come from the bottom of the next word. Each extracted `w`-bit pattern is a **signed two's-complement** integer and must be **sign-extended** to an ordinary Python `int`: - If the pattern's top bit (bit `w - 1`) is set, the value is negative — subtract `2^w` from the raw pattern. - Otherwise the raw pattern is the value as-is. A bit-packed block with `count == 0` contributes no values (the `'words'` list is present per the format above but no bits are read from it). ## Examples - `[]` → `[]` - `[{'type': 'R', 'value': 5, 'count': 4}]` → `[5, 5, 5, 5]` - `[{'type': 'B', 'bitWidth': 3, 'count': 3, 'words': [209]}]` → `[1, 2, 3]` (`209` is `0b11010001`; reading 3-bit fields from the lowest bits gives `001`, `010`, `011`.) - `[{'type': 'B', 'bitWidth': 32, 'count': 2, 'words': [2147483647, 2147483648]}, {'type': 'R', 'value': -1, 'count': 2}]` → `[2147483647, -2147483648, -1, -1]` (the unsigned word `2147483648` sign-extends to `-2147483648`.) - `[{'type': 'B', 'bitWidth': 5, 'count': 7, 'words': [4294967295, 7]}]` → `[-1, -1, -1, -1, -1, -1, -1]` (each 5-bit field is all ones, which sign-extends to `-1`; the last field straddles the word boundary.) ## Constraints - `0 <= len(blocks) <= 10^5` - Each block is valid and follows the canonical format described above. - For bit-packed blocks, `1 <= bitWidth <= 32` when `count > 0`. - All decoded values fit in the signed 32-bit range.

Constraints

0 <= len(blocks) <= 10^5
Each block is valid and follows the canonical format.
For bit-packed blocks, 1 <= bitWidth <= 32 when count > 0.
All decoded values fit in the signed 32-bit range.

Examples

Input: []

Expected Output: []

Explanation: Edge case: no blocks means no output values.

Input: [{'type': 'R', 'value': 5, 'count': 4}]

Expected Output: [5, 5, 5, 5]

Explanation: An RLE block expands into repeated values.

Input: [{'type': 'B', 'bitWidth': 3, 'count': 3, 'words': [209]}]

Expected Output: [1, 2, 3]

Explanation: The 3-bit values are packed as 001, 010, 011.

Input: [{'type': 'B', 'bitWidth': 32, 'count': 2, 'words': [2147483647, 2147483648]}, {'type': 'R', 'value': -1, 'count': 2}]

Expected Output: [2147483647, -2147483648, -1, -1]

Explanation: This checks both 32-bit signed decoding and iteration across different block types.

Input: [{'type': 'B', 'bitWidth': 5, 'count': 7, 'words': [4294967295, 7]}]

Expected Output: [-1, -1, -1, -1, -1, -1, -1]

Explanation: Seven 5-bit values cross the 32-bit boundary; every 5-bit pattern is 11111, which is -1.

Solution

def solution(blocks):
    result = []

    for block in blocks:
        if block['type'] == 'R':
            result.extend([block['value']] * block['count'])
            continue

        w = block['bitWidth']
        count = block['count']
        words = block['words']
        if count == 0:
            continue

        mask = (1 << w) - 1
        bit_pos = 0
        for _ in range(count):
            word_index = bit_pos // 32
            offset = bit_pos % 32

            if offset + w <= 32:
                u = (words[word_index] >> offset) & mask
            else:
                low_bits = 32 - offset
                high_bits = w - low_bits
                low = (words[word_index] >> offset) & ((1 << low_bits) - 1)
                high = words[word_index + 1] & ((1 << high_bits) - 1)
                u = low | (high << low_bits)

            sign_bit = 1 << (w - 1)
            if u & sign_bit:
                value = u - (1 << w)
            else:
                value = u
            result.append(value)
            bit_pos += w

    return result

Explanation

The solution walks the blocks in order and appends decoded values to a single `result` list. **RLE blocks** (`type == 'R'`) are trivial: append `count` copies of `value` with `result.extend([value] * count)`. **Bit-packed blocks** (`type == 'B'`) are the interesting case. Values are stored back-to-back as `w`-bit fields inside a stream of unsigned 32-bit `words`, with no padding. A running cursor `bit_pos` tracks how many bits have been consumed; for each of the `count` values we compute: - `word_index = bit_pos // 32` — which word the field starts in, - `offset = bit_pos % 32` — the bit offset inside that word. To extract the raw `w`-bit pattern `u`: - If the field stays inside one word (`offset + w <= 32`), shift right by `offset` and mask with `(1 << w) - 1`. - If it straddles a word boundary, take the top `32 - offset` bits as the **low** part, take the remaining `w - low_bits` bits from the *start* of the next word as the **high** part, and stitch them as `low | (high << low_bits)`. This works because the stream is little-endian within each word (first value uses the lowest bits). **Sign extension:** each `u` is a two's-complement value. If its top bit (`1 << (w-1)`) is set, the true value is `u - (1 << w)`; otherwise it's `u` unchanged. Correctness hinges on advancing `bit_pos += w` after every value, so word boundaries are handled uniformly. Note when `w == 32`, `bit_pos` is always a multiple of 32, so `offset == 0` and the single-word path with a full 32-bit mask is taken — matching the test where `2147483648` decodes to `-2147483648`.

Time complexity: O(N). Space complexity: O(N).

Hints

The i-th packed value starts at bit position i * bitWidth inside that block's bitstream.
After extracting the unsigned w-bit pattern, check its highest bit to decide whether it represents a negative number.

Part 4: Validate codec unit-test coverage

Audit a proposed unit-test suite for an integer codec and report which coverage scenarios the suite already exercises. You are validating the test coverage of an integer codec. Each test case is a **sequence** of signed 32-bit integers, and the whole suite is a list of such sequences. ## What to implement Implement: ``` solution(test_sequences, rle_min_run=3) ``` - `test_sequences` is a list of sequences; each sequence is a list of signed 32-bit integers (a sequence may be empty). - `rle_min_run` is the run-length threshold used by the codec (defaults to `3`). Return a dictionary with the following **boolean** keys. ## Encoding model (run analysis) The codec encodes a sequence by splitting it into its **maximal equal-value runs** — maximal stretches of consecutive elements that are all equal. - A run whose length is **`>= rle_min_run`** is encoded as an **RLE block**. - A run whose length is **`< rle_min_run`** is encoded as a **bit-packed block**. A sequence "uses only bit-packed blocks" when it is **non-empty** and **none** of its maximal equal-value runs reaches `rle_min_run`. ## Output keys Each key is `true` if **at least one** sequence in the suite satisfies the condition (an OR across the whole suite): - **`simple_bp`** — there is a **non-empty** sequence whose encoding uses **only bit-packed blocks** (no run reaches `rle_min_run`). Empty sequences never count. - **`long_run_rle`** — there is a sequence containing a maximal run of length **`>= rle_min_run`** (i.e. its encoding has at least one RLE block). - **`alternating_switch`** — there is a **single** sequence whose encoding contains **both** an RLE block **and** a bit-packed block. - **`has_int_min`** — the value `-2147483648` appears anywhere in the suite. - **`has_int_max`** — the value `2147483647` appears anywhere in the suite. - **`has_negative`** — any negative value appears anywhere in the suite. - **`empty_input`** — at least one sequence is empty. - **`complete`** — `true` only if **all seven** of the keys above are `true`. Each flag is independent and is set as soon as one qualifying sequence is found. ## Example For `test_sequences = [[], [1, 2, 3], [7, 7, 7, 7], [5, 5, 5, 1, 2, 2, 3, 3, 3], [2147483647, -2147483648, -1]]` with `rle_min_run = 3`: - `[1, 2, 3]` → three runs of length 1 each (all bit-packed) → satisfies **`simple_bp`**. - `[7, 7, 7, 7]` → one run of length 4 (`>= 3`, an RLE block) → satisfies **`long_run_rle`**. - `[5, 5, 5, 1, 2, 2, 3, 3, 3]` → a length-3 run (RLE) plus shorter runs (bit-packed) → satisfies **`alternating_switch`**. - The last sequence supplies `has_int_max`, `has_int_min`, and `has_negative`; the empty sequence supplies `empty_input`. All seven conditions hold, so the result is: ``` {'simple_bp': True, 'long_run_rle': True, 'alternating_switch': True, 'has_int_min': True, 'has_int_max': True, 'has_negative': True, 'empty_input': True, 'complete': True} ``` ## Constraints - `0 <= number of sequences <= 10^4` - `0 <= total number of integers across all sequences <= 2 * 10^5` - `1 <= rle_min_run <= 10` - All integers are in the signed 32-bit range `[-2147483648, 2147483647]`.

Constraints

0 <= number of sequences <= 10^4
0 <= total number of integers across all sequences <= 2 * 10^5
1 <= rle_min_run <= 10
All integers are in the signed 32-bit range [-2147483648, 2147483647].

Examples

Input: ([[], [1, 2, 3], [7, 7, 7, 7], [5, 5, 5, 1, 2, 2, 3, 3, 3], [2147483647, -2147483648, -1]], 3)

Expected Output: {'simple_bp': True, 'long_run_rle': True, 'alternating_switch': True, 'has_int_min': True, 'has_int_max': True, 'has_negative': True, 'empty_input': True, 'complete': True}

Explanation: This suite covers every required category.

Input: ([[1, 2], [3, 3, 3]], 3)

Expected Output: {'simple_bp': True, 'long_run_rle': True, 'alternating_switch': False, 'has_int_min': False, 'has_int_max': False, 'has_negative': False, 'empty_input': False, 'complete': False}

Explanation: The suite has BP-only and RLE-only cases, but no empty case, no boundary values, and no single sequence that switches strategies.

Input: ([[]], 3)

Expected Output: {'simple_bp': False, 'long_run_rle': False, 'alternating_switch': False, 'has_int_min': False, 'has_int_max': False, 'has_negative': False, 'empty_input': True, 'complete': False}

Explanation: Edge case: only the empty-input requirement is satisfied.

Input: ([[-1, -2], [4, 4, 4, 5, 6, 6, 6], [2147483647]], 3)

Expected Output: {'simple_bp': True, 'long_run_rle': True, 'alternating_switch': True, 'has_int_min': False, 'has_int_max': True, 'has_negative': True, 'empty_input': False, 'complete': False}

Explanation: This suite is close, but it still misses INT_MIN and an empty test.

Input: ([[], [1, 2], [3, 3, 3], [2147483647, -2147483648, -1]], 3)

Expected Output: {'simple_bp': True, 'long_run_rle': True, 'alternating_switch': False, 'has_int_min': True, 'has_int_max': True, 'has_negative': True, 'empty_input': True, 'complete': False}

Explanation: Even with all other categories present, alternating_switch is still missing because no single sequence uses both block types.

Solution

def solution(test_sequences, rle_min_run=3):
    INT_MIN = -2147483648
    INT_MAX = 2147483647

    def analyze_sequence(seq):
        has_r = False
        has_b = False
        i = 0
        n = len(seq)
        while i < n:
            j = i + 1
            while j < n and seq[j] == seq[i]:
                j += 1
            if j - i >= rle_min_run:
                has_r = True
            else:
                has_b = True
            i = j
        return has_r, has_b

    summary = {
        'simple_bp': False,
        'long_run_rle': False,
        'alternating_switch': False,
        'has_int_min': False,
        'has_int_max': False,
        'has_negative': False,
        'empty_input': False,
        'complete': False,
    }

    for seq in test_sequences:
        if not seq:
            summary['empty_input'] = True
        if any(v == INT_MIN for v in seq):
            summary['has_int_min'] = True
        if any(v == INT_MAX for v in seq):
            summary['has_int_max'] = True
        if any(v < 0 for v in seq):
            summary['has_negative'] = True

        has_r, has_b = analyze_sequence(seq)
        if seq and not has_r:
            summary['simple_bp'] = True
        if has_r:
            summary['long_run_rle'] = True
        if has_r and has_b:
            summary['alternating_switch'] = True

    summary['complete'] = all([
        summary['simple_bp'],
        summary['long_run_rle'],
        summary['alternating_switch'],
        summary['has_int_min'],
        summary['has_int_max'],
        summary['has_negative'],
        summary['empty_input'],
    ])
    return summary

Explanation

The function audits a test suite (a list of integer sequences) and returns seven boolean coverage flags plus a `complete` flag that is true only when all seven hold. **Core helper — `analyze_sequence`.** It walks each sequence with a two-pointer scan over *maximal equal-value runs*: pointer `i` marks a run start, and inner pointer `j` advances while `seq[j] == seq[i]`. The run length is `j - i`. If that length reaches `rle_min_run`, the run would be encoded as an RLE block, so `has_r = True`; otherwise it's a bit-packed block, so `has_b = True`. It returns `(has_r, has_b)` for the whole sequence. **Per-sequence flags.** For every sequence it independently sets: - `empty_input` if the sequence is empty; - `has_int_min` / `has_int_max` / `has_negative` via three `any(...)` scans against `-2147483648`, `2147483647`, and `< 0`. Then it consults the run analysis: - `simple_bp` — sequence is non-empty **and** has no RLE run (`seq and not has_r`), i.e. encodes with bit-packed blocks only; - `long_run_rle` — at least one RLE run exists (`has_r`); - `alternating_switch` — the sequence contains **both** an RLE and a bit-packed block (`has_r and has_b`), matching the "encoding has both block types" requirement. **Correctness.** Each flag is a logical OR accumulated across all sequences, so it becomes true as soon as one sequence satisfies it — exactly the "at least one" semantics the spec asks for. The `simple_bp` guard explicitly excludes empty sequences (which have neither block type). Finally `complete = all([...])` ANDs the seven flags. This directly mirrors the canonical-encoding rules without actually building encodings — it only needs run lengths, which fully determine block types.

Time complexity: O(N), where N is the total number of integers across all sequences. Each integer is touched a constant number of times: once in the run-scan and once per `any(...)` pass (which short-circuit, so at most three additional linear passes per sequence).. Space complexity: O(1) extra space (excluding the fixed-size 8-key result dictionary). The run scan uses only index pointers and the `any(...)` generators allocate nothing beyond constant state..

Hints

A sequence has an RLE block iff at least one maximal run is long enough.
alternating_switch is stronger than having some BP-only tests and some RLE-only tests; one single sequence must produce both block types.

Implement streaming RLE and bit-packed codec

Company: Databricks

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: hard

Interview Round: Onsite

Part 1: Build a canonical block representation

Constraints

block_type is either 'R' or 'B'.
All integers are in the signed 32-bit range [-2147483648, 2147483647].
For an RLE block, count >= 1.
For a bit-packed block, 0 <= len(data) <= 10^4.

Examples

Input: ('R', (5, 4))

Expected Output: {'type': 'R', 'value': 5, 'count': 4}

Explanation: A run-length block is stored directly.

Input: ('B', [0, 1, -1])

Expected Output: {'type': 'B', 'bitWidth': 2, 'count': 3, 'words': [52]}

Explanation: The minimum common width is 2 bits: 0 -> 00, 1 -> 01, -1 -> 11. Packed word is 0b110100 = 52.

Input: ('B', [-2147483648, 2147483647])

Expected Output: {'type': 'B', 'bitWidth': 32, 'count': 2, 'words': [2147483648, 2147483647]}

Explanation: The full 32-bit signed range requires width 32, so each value occupies one full word.

Input: ('B', [])

Expected Output: {'type': 'B', 'bitWidth': 0, 'count': 0, 'words': []}

Explanation: Edge case: empty bit-packed input.

Input: ('B', [7, 7, 7, 7, 7, 7, 7, 7, 7])

Expected Output: {'type': 'B', 'bitWidth': 4, 'count': 9, 'words': [2004318071, 7]}

Explanation: Nine 4-bit values need 36 bits, so the payload spills into a second word.

Solution

def solution(block_type, data):
    def min_bits_signed(x):
        if x >= 0:
            return 1 if x == 0 else x.bit_length() + 1
        return (~x).bit_length() + 1

    def pack(values, w):
        words = []
        current = 0
        used = 0
        mask = (1 << w) - 1
        for v in values:
            u = v & mask
            remaining = w
            while remaining > 0:
                space = 32 - used
                take = remaining if remaining < space else space
                part = u & ((1 << take) - 1)
                current |= part << used
                used += take
                u >>= take
                remaining -= take
                if used == 32:
                    words.append(current & 0xFFFFFFFF)
                    current = 0
                    used = 0
        if used > 0:
            words.append(current & 0xFFFFFFFF)
        return words

    if block_type == 'R':
        value, count = data
        return {'type': 'R', 'value': value, 'count': count}

    values = list(data)
    if not values:
        return {'type': 'B', 'bitWidth': 0, 'count': 0, 'words': []}

    bit_width = max(min_bits_signed(v) for v in values)
    return {'type': 'B', 'bitWidth': bit_width, 'count': len(values), 'words': pack(values, bit_width)}

Explanation

Time complexity: O(n). Space complexity: O(n).

Hints

A signed w-bit two's-complement integer must lie in the range [-2^(w-1), 2^(w-1)-1].
While packing, track the current 32-bit word and the bit offset already used inside it.

Part 2: Encode an integer stream into RLE and bit-packed blocks

Constraints

0 <= len(values) <= 10^5
1 <= rle_min_run <= 10
1 <= max_bp_block <= 128
All input values are in the signed 32-bit range [-2147483648, 2147483647].

Examples

Input: ([], 3, 128)

Expected Output: []

Explanation: Edge case: empty input produces no blocks.

Input: ([1, 2, 3], 3, 128)

Expected Output: [{'type': 'B', 'bitWidth': 3, 'count': 3, 'words': [209]}]

Explanation: No run is long enough for RLE, so the whole sequence becomes one BP block.

Input: ([7, 7, 7, 7], 3, 128)

Expected Output: [{'type': 'R', 'value': 7, 'count': 4}]

Explanation: A maximal run of length 4 qualifies for RLE.

Input: ([5, 5, 5, 1, 2, 2, 3, 3, 3, 4], 3, 128)

Explanation: The 5s and 3s become RLE blocks; the short middle segment and final 4 become BP blocks.

Input: ([0, 0, 1, 1, 2, 2], 3, 4)

Expected Output: [{'type': 'B', 'bitWidth': 2, 'count': 4, 'words': [80]}, {'type': 'B', 'bitWidth': 3, 'count': 2, 'words': [18]}]

Explanation: No run reaches length 3, so everything is BP. max_bp_block = 4 forces the data to split into two BP blocks.

Solution

def solution(values, rle_min_run=3, max_bp_block=128):
    from collections import deque

    def min_bits_signed(x):
        if x >= 0:
            return 1 if x == 0 else x.bit_length() + 1
        return (~x).bit_length() + 1

    def pack(vals, w):
        words = []
        current = 0
        used = 0
        mask = (1 << w) - 1
        for v in vals:
            u = v & mask
            remaining = w
            while remaining > 0:
                space = 32 - used
                take = remaining if remaining < space else space
                part = u & ((1 << take) - 1)
                current |= part << used
                used += take
                u >>= take
                remaining -= take
                if used == 32:
                    words.append(current & 0xFFFFFFFF)
                    current = 0
                    used = 0
        if used > 0:
            words.append(current & 0xFFFFFFFF)
        return words

    def make_bp(chunk):
        bit_width = max(min_bits_signed(v) for v in chunk)
        return {'type': 'B', 'bitWidth': bit_width, 'count': len(chunk), 'words': pack(chunk, bit_width)}

    blocks = []
    buffer = deque()

    def flush_full_chunks():
        while len(buffer) >= max_bp_block:
            chunk = [buffer.popleft() for _ in range(max_bp_block)]
            blocks.append(make_bp(chunk))

    def flush_all():
        while buffer:
            size = min(len(buffer), max_bp_block)
            chunk = [buffer.popleft() for _ in range(size)]
            blocks.append(make_bp(chunk))

    i = 0
    n = len(values)
    while i < n:
        j = i + 1
        while j < n and values[j] == values[i]:
            j += 1
        run_len = j - i

        if run_len >= rle_min_run:
            flush_all()
            blocks.append({'type': 'R', 'value': values[i], 'count': run_len})
        else:
            for k in range(i, j):
                buffer.append(values[k])
            flush_full_chunks()

        i = j

    flush_all()
    return blocks

Explanation

Time complexity: O(n). Space complexity: O(max_bp_block + output_size).

Hints

Think in terms of maximal runs first. A long enough run becomes one R block; a short run is buffered into BP data.
Flush the buffered non-RLE values before emitting an R block, and also whenever the BP buffer reaches max_bp_block.

Part 3: Decode RLE and bit-packed blocks

Constraints

0 <= len(blocks) <= 10^5
Each block is valid and follows the canonical format.
For bit-packed blocks, 1 <= bitWidth <= 32 when count > 0.
All decoded values fit in the signed 32-bit range.

Examples

Input: []

Expected Output: []

Explanation: Edge case: no blocks means no output values.

Input: [{'type': 'R', 'value': 5, 'count': 4}]

Expected Output: [5, 5, 5, 5]

Explanation: An RLE block expands into repeated values.

Input: [{'type': 'B', 'bitWidth': 3, 'count': 3, 'words': [209]}]

Expected Output: [1, 2, 3]

Explanation: The 3-bit values are packed as 001, 010, 011.

Input: [{'type': 'B', 'bitWidth': 32, 'count': 2, 'words': [2147483647, 2147483648]}, {'type': 'R', 'value': -1, 'count': 2}]

Expected Output: [2147483647, -2147483648, -1, -1]

Explanation: This checks both 32-bit signed decoding and iteration across different block types.

Input: [{'type': 'B', 'bitWidth': 5, 'count': 7, 'words': [4294967295, 7]}]

Expected Output: [-1, -1, -1, -1, -1, -1, -1]

Explanation: Seven 5-bit values cross the 32-bit boundary; every 5-bit pattern is 11111, which is -1.

Solution

def solution(blocks):
    result = []

    for block in blocks:
        if block['type'] == 'R':
            result.extend([block['value']] * block['count'])
            continue

        w = block['bitWidth']
        count = block['count']
        words = block['words']
        if count == 0:
            continue

        mask = (1 << w) - 1
        bit_pos = 0
        for _ in range(count):
            word_index = bit_pos // 32
            offset = bit_pos % 32

            if offset + w <= 32:
                u = (words[word_index] >> offset) & mask
            else:
                low_bits = 32 - offset
                high_bits = w - low_bits
                low = (words[word_index] >> offset) & ((1 << low_bits) - 1)
                high = words[word_index + 1] & ((1 << high_bits) - 1)
                u = low | (high << low_bits)

            sign_bit = 1 << (w - 1)
            if u & sign_bit:
                value = u - (1 << w)
            else:
                value = u
            result.append(value)
            bit_pos += w

    return result

Explanation

Time complexity: O(N). Space complexity: O(N).

Hints

The i-th packed value starts at bit position i * bitWidth inside that block's bitstream.
After extracting the unsigned w-bit pattern, check its highest bit to decide whether it represents a negative number.

Part 4: Validate codec unit-test coverage

Constraints

0 <= number of sequences <= 10^4
0 <= total number of integers across all sequences <= 2 * 10^5
1 <= rle_min_run <= 10
All integers are in the signed 32-bit range [-2147483648, 2147483647].

Examples

Input: ([[], [1, 2, 3], [7, 7, 7, 7], [5, 5, 5, 1, 2, 2, 3, 3, 3], [2147483647, -2147483648, -1]], 3)

Expected Output: {'simple_bp': True, 'long_run_rle': True, 'alternating_switch': True, 'has_int_min': True, 'has_int_max': True, 'has_negative': True, 'empty_input': True, 'complete': True}

Explanation: This suite covers every required category.

Input: ([[1, 2], [3, 3, 3]], 3)

Expected Output: {'simple_bp': True, 'long_run_rle': True, 'alternating_switch': False, 'has_int_min': False, 'has_int_max': False, 'has_negative': False, 'empty_input': False, 'complete': False}

Explanation: The suite has BP-only and RLE-only cases, but no empty case, no boundary values, and no single sequence that switches strategies.

Input: ([[]], 3)

Expected Output: {'simple_bp': False, 'long_run_rle': False, 'alternating_switch': False, 'has_int_min': False, 'has_int_max': False, 'has_negative': False, 'empty_input': True, 'complete': False}

Explanation: Edge case: only the empty-input requirement is satisfied.

Input: ([[-1, -2], [4, 4, 4, 5, 6, 6, 6], [2147483647]], 3)

Expected Output: {'simple_bp': True, 'long_run_rle': True, 'alternating_switch': True, 'has_int_min': False, 'has_int_max': True, 'has_negative': True, 'empty_input': False, 'complete': False}

Explanation: This suite is close, but it still misses INT_MIN and an empty test.

Input: ([[], [1, 2], [3, 3, 3], [2147483647, -2147483648, -1]], 3)

Expected Output: {'simple_bp': True, 'long_run_rle': True, 'alternating_switch': False, 'has_int_min': True, 'has_int_max': True, 'has_negative': True, 'empty_input': True, 'complete': False}

Explanation: Even with all other categories present, alternating_switch is still missing because no single sequence uses both block types.

Solution

def solution(test_sequences, rle_min_run=3):
    INT_MIN = -2147483648
    INT_MAX = 2147483647

    def analyze_sequence(seq):
        has_r = False
        has_b = False
        i = 0
        n = len(seq)
        while i < n:
            j = i + 1
            while j < n and seq[j] == seq[i]:
                j += 1
            if j - i >= rle_min_run:
                has_r = True
            else:
                has_b = True
            i = j
        return has_r, has_b

    summary = {
        'simple_bp': False,
        'long_run_rle': False,
        'alternating_switch': False,
        'has_int_min': False,
        'has_int_max': False,
        'has_negative': False,
        'empty_input': False,
        'complete': False,
    }

    for seq in test_sequences:
        if not seq:
            summary['empty_input'] = True
        if any(v == INT_MIN for v in seq):
            summary['has_int_min'] = True
        if any(v == INT_MAX for v in seq):
            summary['has_int_max'] = True
        if any(v < 0 for v in seq):
            summary['has_negative'] = True

        has_r, has_b = analyze_sequence(seq)
        if seq and not has_r:
            summary['simple_bp'] = True
        if has_r:
            summary['long_run_rle'] = True
        if has_r and has_b:
            summary['alternating_switch'] = True

    summary['complete'] = all([
        summary['simple_bp'],
        summary['long_run_rle'],
        summary['alternating_switch'],
        summary['has_int_min'],
        summary['has_int_max'],
        summary['has_negative'],
        summary['empty_input'],
    ])
    return summary

Explanation

Hints

A sequence has an RLE block iff at least one maximal run is long enough.
alternating_switch is stronger than having some BP-only tests and some RLE-only tests; one single sequence must produce both block types.

Quick Overview

Quick Overview