PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates understanding of data compression algorithms, bit-level manipulation, two's‑complement integer representation, streaming algorithm design, and encoder/decoder state management for 32-bit signed integers.

  • hard
  • Databricks
  • Coding & Algorithms
  • Software Engineer

Implement streaming RLE and bit-packed codec

Company: Databricks

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: hard

Interview Round: Onsite

You are implementing a simple compression scheme for sequences of 32‑bit signed integers. The codec should support **two encoding strategies**: 1. **Run‑Length Encoding (RLE)** for long runs of equal values. 2. **Bit‑Packed Encoding (BP)** for blocks of values that can be represented with a small, uniform bit‑width. The codec is **streaming**: values arrive one by one, and the encoder should buffer them into blocks and choose an encoding strategy per block. ### Compressed representation Design a compressed representation made of **blocks**. For concreteness, define: - **RLE block** - Represents a run of a single repeated integer value. - Fields: - `type = 'R'` - `value` (int32) - `count` (int, number of repetitions) - Only use RLE blocks when `count >= RLE_MIN_RUN` (e.g., `RLE_MIN_RUN = 3`). Shorter runs should not be encoded as RLE. - **Bit‑packed block** - Represents a sequence of `count` possibly different integers, all of which fit into `bitWidth` bits in two's‑complement representation. - Fields: - `type = 'B'` - `bitWidth` (1–32) - `count` (number of values) - A packed payload containing exactly `count` values, each stored using `bitWidth` bits. - You may choose a **maximum block size** `MAX_BP_BLOCK` (e.g., 128 values) for simplicity. You can decide the in‑memory representation of a bit‑packed payload (e.g., array of 32‑bit integers where bits are tightly packed), as long as the decoder can reconstruct the original sequence exactly. ### Encoder Implement an `Encoder` class with the following behavior: - It receives input values **one at a time** via a method like: ```text void add(int value) ``` - Internally, the encoder may maintain a buffer of recent values to decide whether to form an RLE block or a bit‑packed block. - At appropriate times (e.g., when a block is full, when the encoding strategy should change, or when `flush()` is called), it should **emit blocks**. - Provide a method: ```text List<Block> flush() ``` that finalizes the stream, closes any open block, and returns the list of compressed blocks. **Encoding strategy constraints:** - For a maximal run of the same value with length `L`: - If `L >= RLE_MIN_RUN`, you should encode it as a single RLE block. - Otherwise, those values should be part of a bit‑packed block. - For bit‑packed blocks: - You may group consecutive non‑RLE values into blocks up to size `MAX_BP_BLOCK`. - Choose `bitWidth` for a block as the minimum number of bits needed to represent **all** its values. - You do **not** need to prove global optimality of the compression, but your encoder must consistently follow the above rules. The encoder must correctly handle: - Negative values and the full range of 32‑bit signed integers (`Integer.MIN_VALUE` to `Integer.MAX_VALUE`). - Transitions between RLE and bit‑packed segments in the stream. ### Decoder Implement a corresponding `Decoder` that reconstructs the original integer sequence from a list of blocks. - Its constructor receives the list of blocks produced by the encoder: ```text class Decoder implements Iterator<Integer> { Decoder(List<Block> blocks) { ... } boolean hasNext(); int next(); } ``` - `hasNext()` / `next()` should expose the **original sequence of integers** in order, exactly as they were passed to `Encoder.add(...)`. - The decoder must correctly iterate across both RLE and bit‑packed blocks. ### Tasks 1. Specify the exact in‑memory structure you will use for `Block` and the bit‑packed payload. 2. Implement `Encoder.add(value)` and `Encoder.flush()` to produce a valid block sequence. 3. Implement `Decoder` as an iterator over the decompressed values. 4. Write unit tests for cases including: - Simple sequences without repeats (forces bit‑packing). - Long runs of a single value (forces RLE). - Alternating patterns (switching between RLE and BP). - Values near `Integer.MAX_VALUE`, `Integer.MIN_VALUE`, and negative numbers. - Empty input sequence. Assume you are working in an object‑oriented language (e.g., Java, C++, or similar) and focus on clean class design and correctness of the codec pair.

Quick Answer: This question evaluates understanding of data compression algorithms, bit-level manipulation, two's‑complement integer representation, streaming algorithm design, and encoder/decoder state management for 32-bit signed integers.

Part 1: Build a canonical block representation

Build a single **compression block** in a canonical Python in-memory format. Implement: ```python def solution(block_type, data): ``` `block_type` is either `'R'` (run-length encoded) or `'B'` (bit-packed). The shape of `data` and the dictionary you return depend on `block_type`. ## RLE block (`block_type == 'R'`) `data` is a `(value, count)` tuple. Return the value and count wrapped in the canonical RLE shape — no computation is needed: ```python {'type': 'R', 'value': value, 'count': count} ``` - `count >= 1`. ## Bit-packed block (`block_type == 'B'`) `data` is a list of signed 32-bit integers. Return: ```python {'type': 'B', 'bitWidth': w, 'count': n, 'words': words} ``` where `n` is the number of input values and `words` is a list of **unsigned 32-bit** integers. **Empty input.** If `data` is empty, return exactly: ```python {'type': 'B', 'bitWidth': 0, 'count': 0, 'words': []} ``` **Otherwise**, build the block in two steps: 1. **Choose the bit width `w`.** Compute, for each value, the minimum number of bits needed to represent it in **signed two's-complement** form (the encoding must include a sign bit, so non-negative values keep a leading `0`). The block's `bitWidth` is the **maximum** such width over all values, so every value fits in `w` bits. Notes: - The value `0` needs **1** bit. - A list containing both `-2147483648` and `2147483647` needs `32` bits. 2. **Pack the values tightly into 32-bit words.** Encode each value as its low `w` bits in two's-complement (for a negative value `v`, this is `v & ((1 << w) - 1)`), then lay these `w`-bit fields end to end with no gaps, in **input order**: - The first value occupies the **lowest** `w` bits of `words[0]`, the second value starts immediately after it (at bit offset `w`), and so on. Within a word, earlier values occupy lower bits (**little-endian** bit cursor). - When a value's field crosses a 32-bit boundary, its low bits finish the current word and its remaining high bits continue at the bottom of the next word. - Each completed 32-bit word is emitted as an unsigned value in `[0, 2^32 − 1]`. After all values are packed, flush any partially filled final word as well. **Example.** `('B', [0, 1, -1])` → `bitWidth` 2, encodings `00`, `01`, `11`, packed little-endian as `11 01 00` (binary `110100`) → `{'type': 'B', 'bitWidth': 2, 'count': 3, 'words': [52]}`. ## Constraints - `block_type` is either `'R'` or `'B'`. - All integers are in the signed 32-bit range `[-2147483648, 2147483647]`. - For an RLE block, `count >= 1`. - For a bit-packed block, `0 <= len(data) <= 10^4`.

Constraints

  • block_type is either 'R' or 'B'.
  • All integers are in the signed 32-bit range [-2147483648, 2147483647].
  • For an RLE block, count >= 1.
  • For a bit-packed block, 0 <= len(data) <= 10^4.

Examples

Input: ('R', (5, 4))

Expected Output: {'type': 'R', 'value': 5, 'count': 4}

Explanation: A run-length block is stored directly.

Input: ('B', [0, 1, -1])

Expected Output: {'type': 'B', 'bitWidth': 2, 'count': 3, 'words': [52]}

Explanation: The minimum common width is 2 bits: 0 -> 00, 1 -> 01, -1 -> 11. Packed word is 0b110100 = 52.

Input: ('B', [-2147483648, 2147483647])

Expected Output: {'type': 'B', 'bitWidth': 32, 'count': 2, 'words': [2147483648, 2147483647]}

Explanation: The full 32-bit signed range requires width 32, so each value occupies one full word.

Input: ('B', [])

Expected Output: {'type': 'B', 'bitWidth': 0, 'count': 0, 'words': []}

Explanation: Edge case: empty bit-packed input.

Input: ('B', [7, 7, 7, 7, 7, 7, 7, 7, 7])

Expected Output: {'type': 'B', 'bitWidth': 4, 'count': 9, 'words': [2004318071, 7]}

Explanation: Nine 4-bit values need 36 bits, so the payload spills into a second word.

Solution

def solution(block_type, data):
    def min_bits_signed(x):
        if x >= 0:
            return 1 if x == 0 else x.bit_length() + 1
        return (~x).bit_length() + 1

    def pack(values, w):
        words = []
        current = 0
        used = 0
        mask = (1 << w) - 1
        for v in values:
            u = v & mask
            remaining = w
            while remaining > 0:
                space = 32 - used
                take = remaining if remaining < space else space
                part = u & ((1 << take) - 1)
                current |= part << used
                used += take
                u >>= take
                remaining -= take
                if used == 32:
                    words.append(current & 0xFFFFFFFF)
                    current = 0
                    used = 0
        if used > 0:
            words.append(current & 0xFFFFFFFF)
        return words

    if block_type == 'R':
        value, count = data
        return {'type': 'R', 'value': value, 'count': count}

    values = list(data)
    if not values:
        return {'type': 'B', 'bitWidth': 0, 'count': 0, 'words': []}

    bit_width = max(min_bits_signed(v) for v in values)
    return {'type': 'B', 'bitWidth': bit_width, 'count': len(values), 'words': pack(values, bit_width)}
Explanation
The function builds one canonical compression block. **RLE path (`block_type == 'R'`).** `data` is a `(value, count)` tuple; the code just wraps it into `{'type': 'R', 'value': value, 'count': count}`. Nothing to compute. **Bit-packed path (`block_type == 'B'`).** Empty input short-circuits to the prescribed `bitWidth: 0` block. Otherwise it works in two phases: 1. **Choose the bit width.** `min_bits_signed(x)` returns the minimum *signed* two's-complement width: - `x == 0` → `1` - `x > 0` → `x.bit_length() + 1` (data bits plus a sign bit so the top bit stays `0`) - `x < 0` → `(~x).bit_length() + 1` (since `~x == -x-1`, this is the width needed for the magnitude plus the sign bit) `bit_width = max(min_bits_signed(v) for v in values)` makes one width fit every value, so e.g. `[-2147483648, 2147483647]` needs `32`. 2. **Pack tightly into 32-bit words.** `pack` keeps an accumulator `current` and a counter `used` (bits filled in the current word). Each value is first reduced to its low `w` bits via `u = v & mask` — this is exactly the two's-complement encoding for negatives. It then streams those `w` bits out: `take = min(remaining, 32 - used)` bits are OR'd in at offset `used` (`current |= part << used`), and when `used` hits 32 the word is flushed (`& 0xFFFFFFFF`) and the accumulator resets. A value that crosses a word boundary simply continues in the next word, matching the canonical "first value in the lowest bits of `words[0]`" rule. Any partial final word is flushed at the end. This is correct because masking gives canonical signed encodings, and the little-endian bit cursor preserves order and boundary-crossing exactly as specified.

Time complexity: O(n). Space complexity: O(n).

Hints

  1. A signed w-bit two's-complement integer must lie in the range [-2^(w-1), 2^(w-1)-1].
  2. While packing, track the current 32-bit word and the bit offset already used inside it.

Part 2: Encode an integer stream into RLE and bit-packed blocks

Implement the **encoder** for a simple integer codec that compresses a sequence of signed integers into a list of *blocks*. Implement the function: ```python def solution(values, rle_min_run=3, max_bp_block=128): ``` - `values`: a list of signed 32-bit integers (may be empty). - `rle_min_run`: minimum run length that qualifies for run-length encoding. - `max_bp_block`: maximum number of values per bit-packed block. Return the full list of encoded blocks (in order). For an empty input, return an empty list. ## Block formats There are two kinds of blocks, each a dictionary: - **RLE block** — `{'type': 'R', 'value': value, 'count': count}` - **Bit-packed block** — `{'type': 'B', 'bitWidth': w, 'count': n, 'words': words}` ## Encoding rules 1. **Scan in order.** Process `values` left to right in a single pass. 2. **Detect maximal runs.** A *run* is a maximal stretch of consecutive **equal** values. For a run of length `L`: - If `L >= rle_min_run`, emit exactly **one** RLE block `{'type': 'R', 'value': <the repeated value>, 'count': L}` for that run. - Otherwise (`L < rle_min_run`), the values of that run are **non-RLE** and must be carried into bit-packed blocks. 3. **Order is preserved.** Non-RLE values are emitted in their original order. Whenever an RLE block is produced, all pending non-RLE values accumulated so far must be flushed (as bit-packed blocks) **before** that RLE block is appended. 4. **Group non-RLE values into bit-packed blocks.** Treat the running stream of non-RLE values as one continuous sequence (it may span across multiple short runs and across different values). Cut it into chunks of **at most `max_bp_block`** values each, in order; each chunk becomes one bit-packed block. A trailing chunk smaller than `max_bp_block` is allowed at a flush boundary (i.e. before an RLE block or at the end of input). 5. **Choose the bit width.** Each bit-packed block uses the **minimum signed two's-complement bit width** needed to represent **every** value in that block. Concretely, the block's `bitWidth` `w` is the maximum over its values of the minimum signed width of each value, where: - `0` needs **1** bit, - a positive value `x` needs `x.bit_length() + 1` bits (an extra bit for the sign), - a negative value `x` needs `(~x).bit_length() + 1` bits. (So `-1` → 1, `7` → 4, `2147483647` → 32, and `-2147483648` → 32.) 6. **Pack values tightly.** Within a bit-packed block, each value is reduced to its low `w` bits via its two's-complement representation (`value & ((1 << w) - 1)`) and packed **tightly, LSB-first**, into a list of **unsigned 32-bit** words (`words`): - The first value occupies the lowest `w` bits of `words[0]`. - The next value follows **immediately** in the next `w` bits, and so on. - When a value straddles a 32-bit word boundary, its low bits go into the current word and the remaining high bits continue at the start of the next word. - After all values are written, a final partial word (if any) is appended. Every entry in `words` is an unsigned 32-bit integer (`0 .. 2^32 - 1`). `count` is the number of values packed into the block (`n`). Return the complete ordered list of `R` and `B` blocks. ## Examples - `solution([], 3, 128)` → `[]` - `solution([1, 2, 3], 3, 128)` → `[{'type': 'B', 'bitWidth': 3, 'count': 3, 'words': [209]}]` - `solution([7, 7, 7, 7], 3, 128)` → `[{'type': 'R', 'value': 7, 'count': 4}]` - `solution([5, 5, 5, 1, 2, 2, 3, 3, 3, 4], 3, 128)` → `[{'type': 'R', 'value': 5, 'count': 3}, {'type': 'B', 'bitWidth': 3, 'count': 3, 'words': [145]}, {'type': 'R', 'value': 3, 'count': 3}, {'type': 'B', 'bitWidth': 4, 'count': 1, 'words': [4]}]` - `solution([0, 0, 1, 1, 2, 2], 3, 4)` → `[{'type': 'B', 'bitWidth': 2, 'count': 4, 'words': [80]}, {'type': 'B', 'bitWidth': 3, 'count': 2, 'words': [18]}]` (No run reaches length 3, so all six values are non-RLE. They form one continuous stream that is cut into chunks of at most `max_bp_block = 4`: the first 4 values `[0, 0, 1, 1]` and then `[2, 2]`.) ## Constraints - `0 <= len(values) <= 10^5` - `1 <= rle_min_run <= 10` - `1 <= max_bp_block <= 128` - All input values are in the signed 32-bit range `[-2147483648, 2147483647]`.

Constraints

  • 0 <= len(values) <= 10^5
  • 1 <= rle_min_run <= 10
  • 1 <= max_bp_block <= 128
  • All input values are in the signed 32-bit range [-2147483648, 2147483647].

Examples

Input: ([], 3, 128)

Expected Output: []

Explanation: Edge case: empty input produces no blocks.

Input: ([1, 2, 3], 3, 128)

Expected Output: [{'type': 'B', 'bitWidth': 3, 'count': 3, 'words': [209]}]

Explanation: No run is long enough for RLE, so the whole sequence becomes one BP block.

Input: ([7, 7, 7, 7], 3, 128)

Expected Output: [{'type': 'R', 'value': 7, 'count': 4}]

Explanation: A maximal run of length 4 qualifies for RLE.

Input: ([5, 5, 5, 1, 2, 2, 3, 3, 3, 4], 3, 128)

Expected Output: [{'type': 'R', 'value': 5, 'count': 3}, {'type': 'B', 'bitWidth': 3, 'count': 3, 'words': [145]}, {'type': 'R', 'value': 3, 'count': 3}, {'type': 'B', 'bitWidth': 4, 'count': 1, 'words': [4]}]

Explanation: The 5s and 3s become RLE blocks; the short middle segment and final 4 become BP blocks.

Input: ([0, 0, 1, 1, 2, 2], 3, 4)

Expected Output: [{'type': 'B', 'bitWidth': 2, 'count': 4, 'words': [80]}, {'type': 'B', 'bitWidth': 3, 'count': 2, 'words': [18]}]

Explanation: No run reaches length 3, so everything is BP. max_bp_block = 4 forces the data to split into two BP blocks.

Solution

def solution(values, rle_min_run=3, max_bp_block=128):
    from collections import deque

    def min_bits_signed(x):
        if x >= 0:
            return 1 if x == 0 else x.bit_length() + 1
        return (~x).bit_length() + 1

    def pack(vals, w):
        words = []
        current = 0
        used = 0
        mask = (1 << w) - 1
        for v in vals:
            u = v & mask
            remaining = w
            while remaining > 0:
                space = 32 - used
                take = remaining if remaining < space else space
                part = u & ((1 << take) - 1)
                current |= part << used
                used += take
                u >>= take
                remaining -= take
                if used == 32:
                    words.append(current & 0xFFFFFFFF)
                    current = 0
                    used = 0
        if used > 0:
            words.append(current & 0xFFFFFFFF)
        return words

    def make_bp(chunk):
        bit_width = max(min_bits_signed(v) for v in chunk)
        return {'type': 'B', 'bitWidth': bit_width, 'count': len(chunk), 'words': pack(chunk, bit_width)}

    blocks = []
    buffer = deque()

    def flush_full_chunks():
        while len(buffer) >= max_bp_block:
            chunk = [buffer.popleft() for _ in range(max_bp_block)]
            blocks.append(make_bp(chunk))

    def flush_all():
        while buffer:
            size = min(len(buffer), max_bp_block)
            chunk = [buffer.popleft() for _ in range(size)]
            blocks.append(make_bp(chunk))

    i = 0
    n = len(values)
    while i < n:
        j = i + 1
        while j < n and values[j] == values[i]:
            j += 1
        run_len = j - i

        if run_len >= rle_min_run:
            flush_all()
            blocks.append({'type': 'R', 'value': values[i], 'count': run_len})
        else:
            for k in range(i, j):
                buffer.append(values[k])
            flush_full_chunks()

        i = j

    flush_all()
    return blocks
Explanation
The encoder makes a **single left-to-right pass**, greedily separating long equal-value runs (RLE) from everything else (bit-packed), while preserving original order. **Run detection.** The outer `while` finds each maximal run: from index `i`, it advances `j` while `values[j] == values[i]`, giving `run_len = j - i`. Each element is visited once because `i` jumps straight to `j`. **Dispatch per run.** - If `run_len >= rle_min_run`, the run is RLE-worthy. We first `flush_all()` the pending bit-pack `buffer` (order must be preserved), then append `{'type':'R', 'value', 'count':run_len}`. - Otherwise the (short) run's values are pushed into a `deque` `buffer`, and `flush_full_chunks()` emits any complete `max_bp_block`-sized chunk as a 'B' block. **Building a 'B' block** (`make_bp`): `bitWidth` is the **max** of `min_bits_signed(v)` over the chunk — the minimum signed two's-complement width. `min_bits_signed` returns 1 for `0`, `bit_length()+1` for positives (room for the sign bit), and `(~x).bit_length()+1` for negatives — so `-1`→1, `7`→4, `±2^31`→32. **Tight packing** (`pack`): each value's low `w` bits (`v & mask`, two's-complement) are written LSB-first into a rolling 32-bit `current`. When a value straddles a word boundary, the inner loop splits it: it writes `take` bits, flushes `current` to `words` at 32 bits, and continues with the remainder. A final partial word is flushed at the end. After the loop, a closing `flush_all()` drains the remaining buffer. The greedy choice is correct because RLE eligibility depends only on a run's own length, and non-RLE values are emitted in order in fixed-size chunks.

Time complexity: O(n). Space complexity: O(max_bp_block + output_size).

Hints

  1. Think in terms of maximal runs first. A long enough run becomes one R block; a short run is buffered into BP data.
  2. Flush the buffered non-RLE values before emitting an R block, and also whenever the BP buffer reaches max_bp_block.

Part 3: Decode RLE and bit-packed blocks

Decompress a sequence of encoded blocks back into the original list of integers. Implement: ```python def solution(blocks): ... ``` ## Input `blocks` is a list of block objects, processed **in order**. There are two block types, each given as a dictionary. **RLE (run-length) block** — `type == 'R'`: | Key | Meaning | |-----|---------| | `'type'` | the string `'R'` | | `'value'` | the integer that repeats | | `'count'` | how many times `value` repeats (`count >= 0`) | It decodes to `count` copies of `value`. **Bit-packed block** — `type == 'B'`: | Key | Meaning | |-----|---------| | `'type'` | the string `'B'` | | `'bitWidth'` | `w`, the number of bits per value | | `'count'` | `n`, how many values this block encodes (`n >= 0`) | | `'words'` | a list of **unsigned 32-bit** integers holding the packed bits | ## Output Return a single flat list of integers: the decoded values of every block, **concatenated in the order the blocks appear**. An empty `blocks` list returns an empty list. ## Decoding a bit-packed block The `n` values are packed **tightly** (no padding) into the `words` stream, **least-significant bit first**: - The **first** value occupies the lowest `w` bits of `words[0]`. - Each subsequent value starts in the bit position **immediately after** the previous one. - Within a word, lower bit positions hold earlier values. When a value would extend past bit 31 of the current word, it **continues into the next word**: the low bits come from the top of the current word and the remaining high bits come from the bottom of the next word. Each extracted `w`-bit pattern is a **signed two's-complement** integer and must be **sign-extended** to an ordinary Python `int`: - If the pattern's top bit (bit `w - 1`) is set, the value is negative — subtract `2^w` from the raw pattern. - Otherwise the raw pattern is the value as-is. A bit-packed block with `count == 0` contributes no values (the `'words'` list is present per the format above but no bits are read from it). ## Examples - `[]` → `[]` - `[{'type': 'R', 'value': 5, 'count': 4}]` → `[5, 5, 5, 5]` - `[{'type': 'B', 'bitWidth': 3, 'count': 3, 'words': [209]}]` → `[1, 2, 3]` (`209` is `0b11010001`; reading 3-bit fields from the lowest bits gives `001`, `010`, `011`.) - `[{'type': 'B', 'bitWidth': 32, 'count': 2, 'words': [2147483647, 2147483648]}, {'type': 'R', 'value': -1, 'count': 2}]` → `[2147483647, -2147483648, -1, -1]` (the unsigned word `2147483648` sign-extends to `-2147483648`.) - `[{'type': 'B', 'bitWidth': 5, 'count': 7, 'words': [4294967295, 7]}]` → `[-1, -1, -1, -1, -1, -1, -1]` (each 5-bit field is all ones, which sign-extends to `-1`; the last field straddles the word boundary.) ## Constraints - `0 <= len(blocks) <= 10^5` - Each block is valid and follows the canonical format described above. - For bit-packed blocks, `1 <= bitWidth <= 32` when `count > 0`. - All decoded values fit in the signed 32-bit range.

Constraints

  • 0 <= len(blocks) <= 10^5
  • Each block is valid and follows the canonical format.
  • For bit-packed blocks, 1 <= bitWidth <= 32 when count > 0.
  • All decoded values fit in the signed 32-bit range.

Examples

Input: []

Expected Output: []

Explanation: Edge case: no blocks means no output values.

Input: [{'type': 'R', 'value': 5, 'count': 4}]

Expected Output: [5, 5, 5, 5]

Explanation: An RLE block expands into repeated values.

Input: [{'type': 'B', 'bitWidth': 3, 'count': 3, 'words': [209]}]

Expected Output: [1, 2, 3]

Explanation: The 3-bit values are packed as 001, 010, 011.

Input: [{'type': 'B', 'bitWidth': 32, 'count': 2, 'words': [2147483647, 2147483648]}, {'type': 'R', 'value': -1, 'count': 2}]

Expected Output: [2147483647, -2147483648, -1, -1]

Explanation: This checks both 32-bit signed decoding and iteration across different block types.

Input: [{'type': 'B', 'bitWidth': 5, 'count': 7, 'words': [4294967295, 7]}]

Expected Output: [-1, -1, -1, -1, -1, -1, -1]

Explanation: Seven 5-bit values cross the 32-bit boundary; every 5-bit pattern is 11111, which is -1.

Solution

def solution(blocks):
    result = []

    for block in blocks:
        if block['type'] == 'R':
            result.extend([block['value']] * block['count'])
            continue

        w = block['bitWidth']
        count = block['count']
        words = block['words']
        if count == 0:
            continue

        mask = (1 << w) - 1
        bit_pos = 0
        for _ in range(count):
            word_index = bit_pos // 32
            offset = bit_pos % 32

            if offset + w <= 32:
                u = (words[word_index] >> offset) & mask
            else:
                low_bits = 32 - offset
                high_bits = w - low_bits
                low = (words[word_index] >> offset) & ((1 << low_bits) - 1)
                high = words[word_index + 1] & ((1 << high_bits) - 1)
                u = low | (high << low_bits)

            sign_bit = 1 << (w - 1)
            if u & sign_bit:
                value = u - (1 << w)
            else:
                value = u
            result.append(value)
            bit_pos += w

    return result
Explanation
The solution walks the blocks in order and appends decoded values to a single `result` list. **RLE blocks** (`type == 'R'`) are trivial: append `count` copies of `value` with `result.extend([value] * count)`. **Bit-packed blocks** (`type == 'B'`) are the interesting case. Values are stored back-to-back as `w`-bit fields inside a stream of unsigned 32-bit `words`, with no padding. A running cursor `bit_pos` tracks how many bits have been consumed; for each of the `count` values we compute: - `word_index = bit_pos // 32` — which word the field starts in, - `offset = bit_pos % 32` — the bit offset inside that word. To extract the raw `w`-bit pattern `u`: - If the field stays inside one word (`offset + w <= 32`), shift right by `offset` and mask with `(1 << w) - 1`. - If it straddles a word boundary, take the top `32 - offset` bits as the **low** part, take the remaining `w - low_bits` bits from the *start* of the next word as the **high** part, and stitch them as `low | (high << low_bits)`. This works because the stream is little-endian within each word (first value uses the lowest bits). **Sign extension:** each `u` is a two's-complement value. If its top bit (`1 << (w-1)`) is set, the true value is `u - (1 << w)`; otherwise it's `u` unchanged. Correctness hinges on advancing `bit_pos += w` after every value, so word boundaries are handled uniformly. Note when `w == 32`, `bit_pos` is always a multiple of 32, so `offset == 0` and the single-word path with a full 32-bit mask is taken — matching the test where `2147483648` decodes to `-2147483648`.

Time complexity: O(N). Space complexity: O(N).

Hints

  1. The i-th packed value starts at bit position i * bitWidth inside that block's bitstream.
  2. After extracting the unsigned w-bit pattern, check its highest bit to decide whether it represents a negative number.

Part 4: Validate codec unit-test coverage

Audit a proposed unit-test suite for an integer codec and report which coverage scenarios the suite already exercises. You are validating the test coverage of an integer codec. Each test case is a **sequence** of signed 32-bit integers, and the whole suite is a list of such sequences. ## What to implement Implement: ``` solution(test_sequences, rle_min_run=3) ``` - `test_sequences` is a list of sequences; each sequence is a list of signed 32-bit integers (a sequence may be empty). - `rle_min_run` is the run-length threshold used by the codec (defaults to `3`). Return a dictionary with the following **boolean** keys. ## Encoding model (run analysis) The codec encodes a sequence by splitting it into its **maximal equal-value runs** — maximal stretches of consecutive elements that are all equal. - A run whose length is **`>= rle_min_run`** is encoded as an **RLE block**. - A run whose length is **`< rle_min_run`** is encoded as a **bit-packed block**. A sequence "uses only bit-packed blocks" when it is **non-empty** and **none** of its maximal equal-value runs reaches `rle_min_run`. ## Output keys Each key is `true` if **at least one** sequence in the suite satisfies the condition (an OR across the whole suite): - **`simple_bp`** — there is a **non-empty** sequence whose encoding uses **only bit-packed blocks** (no run reaches `rle_min_run`). Empty sequences never count. - **`long_run_rle`** — there is a sequence containing a maximal run of length **`>= rle_min_run`** (i.e. its encoding has at least one RLE block). - **`alternating_switch`** — there is a **single** sequence whose encoding contains **both** an RLE block **and** a bit-packed block. - **`has_int_min`** — the value `-2147483648` appears anywhere in the suite. - **`has_int_max`** — the value `2147483647` appears anywhere in the suite. - **`has_negative`** — any negative value appears anywhere in the suite. - **`empty_input`** — at least one sequence is empty. - **`complete`** — `true` only if **all seven** of the keys above are `true`. Each flag is independent and is set as soon as one qualifying sequence is found. ## Example For `test_sequences = [[], [1, 2, 3], [7, 7, 7, 7], [5, 5, 5, 1, 2, 2, 3, 3, 3], [2147483647, -2147483648, -1]]` with `rle_min_run = 3`: - `[1, 2, 3]` → three runs of length 1 each (all bit-packed) → satisfies **`simple_bp`**. - `[7, 7, 7, 7]` → one run of length 4 (`>= 3`, an RLE block) → satisfies **`long_run_rle`**. - `[5, 5, 5, 1, 2, 2, 3, 3, 3]` → a length-3 run (RLE) plus shorter runs (bit-packed) → satisfies **`alternating_switch`**. - The last sequence supplies `has_int_max`, `has_int_min`, and `has_negative`; the empty sequence supplies `empty_input`. All seven conditions hold, so the result is: ``` {'simple_bp': True, 'long_run_rle': True, 'alternating_switch': True, 'has_int_min': True, 'has_int_max': True, 'has_negative': True, 'empty_input': True, 'complete': True} ``` ## Constraints - `0 <= number of sequences <= 10^4` - `0 <= total number of integers across all sequences <= 2 * 10^5` - `1 <= rle_min_run <= 10` - All integers are in the signed 32-bit range `[-2147483648, 2147483647]`.

Constraints

  • 0 <= number of sequences <= 10^4
  • 0 <= total number of integers across all sequences <= 2 * 10^5
  • 1 <= rle_min_run <= 10
  • All integers are in the signed 32-bit range [-2147483648, 2147483647].

Examples

Input: ([[], [1, 2, 3], [7, 7, 7, 7], [5, 5, 5, 1, 2, 2, 3, 3, 3], [2147483647, -2147483648, -1]], 3)

Expected Output: {'simple_bp': True, 'long_run_rle': True, 'alternating_switch': True, 'has_int_min': True, 'has_int_max': True, 'has_negative': True, 'empty_input': True, 'complete': True}

Explanation: This suite covers every required category.

Input: ([[1, 2], [3, 3, 3]], 3)

Expected Output: {'simple_bp': True, 'long_run_rle': True, 'alternating_switch': False, 'has_int_min': False, 'has_int_max': False, 'has_negative': False, 'empty_input': False, 'complete': False}

Explanation: The suite has BP-only and RLE-only cases, but no empty case, no boundary values, and no single sequence that switches strategies.

Input: ([[]], 3)

Expected Output: {'simple_bp': False, 'long_run_rle': False, 'alternating_switch': False, 'has_int_min': False, 'has_int_max': False, 'has_negative': False, 'empty_input': True, 'complete': False}

Explanation: Edge case: only the empty-input requirement is satisfied.

Input: ([[-1, -2], [4, 4, 4, 5, 6, 6, 6], [2147483647]], 3)

Expected Output: {'simple_bp': True, 'long_run_rle': True, 'alternating_switch': True, 'has_int_min': False, 'has_int_max': True, 'has_negative': True, 'empty_input': False, 'complete': False}

Explanation: This suite is close, but it still misses INT_MIN and an empty test.

Input: ([[], [1, 2], [3, 3, 3], [2147483647, -2147483648, -1]], 3)

Expected Output: {'simple_bp': True, 'long_run_rle': True, 'alternating_switch': False, 'has_int_min': True, 'has_int_max': True, 'has_negative': True, 'empty_input': True, 'complete': False}

Explanation: Even with all other categories present, alternating_switch is still missing because no single sequence uses both block types.

Solution

def solution(test_sequences, rle_min_run=3):
    INT_MIN = -2147483648
    INT_MAX = 2147483647

    def analyze_sequence(seq):
        has_r = False
        has_b = False
        i = 0
        n = len(seq)
        while i < n:
            j = i + 1
            while j < n and seq[j] == seq[i]:
                j += 1
            if j - i >= rle_min_run:
                has_r = True
            else:
                has_b = True
            i = j
        return has_r, has_b

    summary = {
        'simple_bp': False,
        'long_run_rle': False,
        'alternating_switch': False,
        'has_int_min': False,
        'has_int_max': False,
        'has_negative': False,
        'empty_input': False,
        'complete': False,
    }

    for seq in test_sequences:
        if not seq:
            summary['empty_input'] = True
        if any(v == INT_MIN for v in seq):
            summary['has_int_min'] = True
        if any(v == INT_MAX for v in seq):
            summary['has_int_max'] = True
        if any(v < 0 for v in seq):
            summary['has_negative'] = True

        has_r, has_b = analyze_sequence(seq)
        if seq and not has_r:
            summary['simple_bp'] = True
        if has_r:
            summary['long_run_rle'] = True
        if has_r and has_b:
            summary['alternating_switch'] = True

    summary['complete'] = all([
        summary['simple_bp'],
        summary['long_run_rle'],
        summary['alternating_switch'],
        summary['has_int_min'],
        summary['has_int_max'],
        summary['has_negative'],
        summary['empty_input'],
    ])
    return summary
Explanation
The function audits a test suite (a list of integer sequences) and returns seven boolean coverage flags plus a `complete` flag that is true only when all seven hold. **Core helper — `analyze_sequence`.** It walks each sequence with a two-pointer scan over *maximal equal-value runs*: pointer `i` marks a run start, and inner pointer `j` advances while `seq[j] == seq[i]`. The run length is `j - i`. If that length reaches `rle_min_run`, the run would be encoded as an RLE block, so `has_r = True`; otherwise it's a bit-packed block, so `has_b = True`. It returns `(has_r, has_b)` for the whole sequence. **Per-sequence flags.** For every sequence it independently sets: - `empty_input` if the sequence is empty; - `has_int_min` / `has_int_max` / `has_negative` via three `any(...)` scans against `-2147483648`, `2147483647`, and `< 0`. Then it consults the run analysis: - `simple_bp` — sequence is non-empty **and** has no RLE run (`seq and not has_r`), i.e. encodes with bit-packed blocks only; - `long_run_rle` — at least one RLE run exists (`has_r`); - `alternating_switch` — the sequence contains **both** an RLE and a bit-packed block (`has_r and has_b`), matching the "encoding has both block types" requirement. **Correctness.** Each flag is a logical OR accumulated across all sequences, so it becomes true as soon as one sequence satisfies it — exactly the "at least one" semantics the spec asks for. The `simple_bp` guard explicitly excludes empty sequences (which have neither block type). Finally `complete = all([...])` ANDs the seven flags. This directly mirrors the canonical-encoding rules without actually building encodings — it only needs run lengths, which fully determine block types.

Time complexity: O(N), where N is the total number of integers across all sequences. Each integer is touched a constant number of times: once in the run-scan and once per `any(...)` pass (which short-circuit, so at most three additional linear passes per sequence).. Space complexity: O(1) extra space (excluding the fixed-size 8-key result dictionary). The run scan uses only index pointers and the `any(...)` generators allocate nothing beyond constant state..

Hints

  1. A sequence has an RLE block iff at least one maximal run is long enough.
  2. alternating_switch is stronger than having some BP-only tests and some RLE-only tests; one single sequence must produce both block types.
Last updated: Apr 19, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Choose the Best Travel Mode - Databricks (medium)
  • Implement an Alternating Tic-Tac-Toe Game - Databricks (hard)
  • Implement a Snapshot Set Iterator - Databricks (medium)
  • Find the Best Commute Mode - Databricks (medium)
  • Partition a Target String by Source Substrings - Databricks (medium)