Implement streaming RLE and bit-packed codec
Company: Databricks
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: hard
Interview Round: Onsite
You are implementing a simple compression scheme for sequences of 32‑bit signed integers. The codec should support **two encoding strategies**:
1. **Run‑Length Encoding (RLE)** for long runs of equal values.
2. **Bit‑Packed Encoding (BP)** for blocks of values that can be represented with a small, uniform bit‑width.
The codec is **streaming**: values arrive one by one, and the encoder should buffer them into blocks and choose an encoding strategy per block.
### Compressed representation
Design a compressed representation made of **blocks**. For concreteness, define:
- **RLE block**
- Represents a run of a single repeated integer value.
- Fields:
- `type = 'R'`
- `value` (int32)
- `count` (int, number of repetitions)
- Only use RLE blocks when `count >= RLE_MIN_RUN` (e.g., `RLE_MIN_RUN = 3`). Shorter runs should not be encoded as RLE.
- **Bit‑packed block**
- Represents a sequence of `count` possibly different integers, all of which fit into `bitWidth` bits in two's‑complement representation.
- Fields:
- `type = 'B'`
- `bitWidth` (1–32)
- `count` (number of values)
- A packed payload containing exactly `count` values, each stored using `bitWidth` bits.
- You may choose a **maximum block size** `MAX_BP_BLOCK` (e.g., 128 values) for simplicity.
You can decide the in‑memory representation of a bit‑packed payload (e.g., array of 32‑bit integers where bits are tightly packed), as long as the decoder can reconstruct the original sequence exactly.
### Encoder
Implement an `Encoder` class with the following behavior:
- It receives input values **one at a time** via a method like:
```text
void add(int value)
```
- Internally, the encoder may maintain a buffer of recent values to decide whether to form an RLE block or a bit‑packed block.
- At appropriate times (e.g., when a block is full, when the encoding strategy should change, or when `flush()` is called), it should **emit blocks**.
- Provide a method:
```text
List<Block> flush()
```
that finalizes the stream, closes any open block, and returns the list of compressed blocks.
**Encoding strategy constraints:**
- For a maximal run of the same value with length `L`:
- If `L >= RLE_MIN_RUN`, you should encode it as a single RLE block.
- Otherwise, those values should be part of a bit‑packed block.
- For bit‑packed blocks:
- You may group consecutive non‑RLE values into blocks up to size `MAX_BP_BLOCK`.
- Choose `bitWidth` for a block as the minimum number of bits needed to represent **all** its values.
- You do **not** need to prove global optimality of the compression, but your encoder must consistently follow the above rules.
The encoder must correctly handle:
- Negative values and the full range of 32‑bit signed integers (`Integer.MIN_VALUE` to `Integer.MAX_VALUE`).
- Transitions between RLE and bit‑packed segments in the stream.
### Decoder
Implement a corresponding `Decoder` that reconstructs the original integer sequence from a list of blocks.
- Its constructor receives the list of blocks produced by the encoder:
```text
class Decoder implements Iterator<Integer> {
Decoder(List<Block> blocks) { ... }
boolean hasNext();
int next();
}
```
- `hasNext()` / `next()` should expose the **original sequence of integers** in order, exactly as they were passed to `Encoder.add(...)`.
- The decoder must correctly iterate across both RLE and bit‑packed blocks.
### Tasks
1. Specify the exact in‑memory structure you will use for `Block` and the bit‑packed payload.
2. Implement `Encoder.add(value)` and `Encoder.flush()` to produce a valid block sequence.
3. Implement `Decoder` as an iterator over the decompressed values.
4. Write unit tests for cases including:
- Simple sequences without repeats (forces bit‑packing).
- Long runs of a single value (forces RLE).
- Alternating patterns (switching between RLE and BP).
- Values near `Integer.MAX_VALUE`, `Integer.MIN_VALUE`, and negative numbers.
- Empty input sequence.
Assume you are working in an object‑oriented language (e.g., Java, C++, or similar) and focus on clean class design and correctness of the codec pair.
Quick Answer: This question evaluates understanding of data compression algorithms, bit-level manipulation, two's‑complement integer representation, streaming algorithm design, and encoder/decoder state management for 32-bit signed integers.
Part 1: Build a canonical block representation
Build a single **compression block** in a canonical Python in-memory format.
Implement:
```python
def solution(block_type, data):
```
`block_type` is either `'R'` (run-length encoded) or `'B'` (bit-packed). The shape of `data` and the dictionary you return depend on `block_type`.
## RLE block (`block_type == 'R'`)
`data` is a `(value, count)` tuple. Return the value and count wrapped in the canonical RLE shape — no computation is needed:
```python
{'type': 'R', 'value': value, 'count': count}
```
- `count >= 1`.
## Bit-packed block (`block_type == 'B'`)
`data` is a list of signed 32-bit integers. Return:
```python
{'type': 'B', 'bitWidth': w, 'count': n, 'words': words}
```
where `n` is the number of input values and `words` is a list of **unsigned 32-bit** integers.
**Empty input.** If `data` is empty, return exactly:
```python
{'type': 'B', 'bitWidth': 0, 'count': 0, 'words': []}
```
**Otherwise**, build the block in two steps:
1. **Choose the bit width `w`.** Compute, for each value, the minimum number of bits needed to represent it in **signed two's-complement** form (the encoding must include a sign bit, so non-negative values keep a leading `0`). The block's `bitWidth` is the **maximum** such width over all values, so every value fits in `w` bits. Notes:
- The value `0` needs **1** bit.
- A list containing both `-2147483648` and `2147483647` needs `32` bits.
2. **Pack the values tightly into 32-bit words.** Encode each value as its low `w` bits in two's-complement (for a negative value `v`, this is `v & ((1 << w) - 1)`), then lay these `w`-bit fields end to end with no gaps, in **input order**:
- The first value occupies the **lowest** `w` bits of `words[0]`, the second value starts immediately after it (at bit offset `w`), and so on. Within a word, earlier values occupy lower bits (**little-endian** bit cursor).
- When a value's field crosses a 32-bit boundary, its low bits finish the current word and its remaining high bits continue at the bottom of the next word.
- Each completed 32-bit word is emitted as an unsigned value in `[0, 2^32 − 1]`. After all values are packed, flush any partially filled final word as well.
**Example.** `('B', [0, 1, -1])` → `bitWidth` 2, encodings `00`, `01`, `11`, packed little-endian as `11 01 00` (binary `110100`) → `{'type': 'B', 'bitWidth': 2, 'count': 3, 'words': [52]}`.
## Constraints
- `block_type` is either `'R'` or `'B'`.
- All integers are in the signed 32-bit range `[-2147483648, 2147483647]`.
- For an RLE block, `count >= 1`.
- For a bit-packed block, `0 <= len(data) <= 10^4`.
Constraints
- block_type is either 'R' or 'B'.
- All integers are in the signed 32-bit range [-2147483648, 2147483647].
- For an RLE block, count >= 1.
- For a bit-packed block, 0 <= len(data) <= 10^4.
Examples
Input: ('R', (5, 4))
Expected Output: {'type': 'R', 'value': 5, 'count': 4}
Explanation: A run-length block is stored directly.
Input: ('B', [0, 1, -1])
Expected Output: {'type': 'B', 'bitWidth': 2, 'count': 3, 'words': [52]}
Explanation: The minimum common width is 2 bits: 0 -> 00, 1 -> 01, -1 -> 11. Packed word is 0b110100 = 52.
Input: ('B', [-2147483648, 2147483647])
Expected Output: {'type': 'B', 'bitWidth': 32, 'count': 2, 'words': [2147483648, 2147483647]}
Explanation: The full 32-bit signed range requires width 32, so each value occupies one full word.
Input: ('B', [])
Expected Output: {'type': 'B', 'bitWidth': 0, 'count': 0, 'words': []}
Explanation: Edge case: empty bit-packed input.
Input: ('B', [7, 7, 7, 7, 7, 7, 7, 7, 7])
Expected Output: {'type': 'B', 'bitWidth': 4, 'count': 9, 'words': [2004318071, 7]}
Explanation: Nine 4-bit values need 36 bits, so the payload spills into a second word.
Solution
def solution(block_type, data):
def min_bits_signed(x):
if x >= 0:
return 1 if x == 0 else x.bit_length() + 1
return (~x).bit_length() + 1
def pack(values, w):
words = []
current = 0
used = 0
mask = (1 << w) - 1
for v in values:
u = v & mask
remaining = w
while remaining > 0:
space = 32 - used
take = remaining if remaining < space else space
part = u & ((1 << take) - 1)
current |= part << used
used += take
u >>= take
remaining -= take
if used == 32:
words.append(current & 0xFFFFFFFF)
current = 0
used = 0
if used > 0:
words.append(current & 0xFFFFFFFF)
return words
if block_type == 'R':
value, count = data
return {'type': 'R', 'value': value, 'count': count}
values = list(data)
if not values:
return {'type': 'B', 'bitWidth': 0, 'count': 0, 'words': []}
bit_width = max(min_bits_signed(v) for v in values)
return {'type': 'B', 'bitWidth': bit_width, 'count': len(values), 'words': pack(values, bit_width)}Explanation
The function builds one canonical compression block.
**RLE path (`block_type == 'R'`).** `data` is a `(value, count)` tuple; the code just wraps it into `{'type': 'R', 'value': value, 'count': count}`. Nothing to compute.
**Bit-packed path (`block_type == 'B'`).** Empty input short-circuits to the prescribed `bitWidth: 0` block. Otherwise it works in two phases:
1. **Choose the bit width.** `min_bits_signed(x)` returns the minimum *signed* two's-complement width:
- `x == 0` → `1`
- `x > 0` → `x.bit_length() + 1` (data bits plus a sign bit so the top bit stays `0`)
- `x < 0` → `(~x).bit_length() + 1` (since `~x == -x-1`, this is the width needed for the magnitude plus the sign bit)
`bit_width = max(min_bits_signed(v) for v in values)` makes one width fit every value, so e.g. `[-2147483648, 2147483647]` needs `32`.
2. **Pack tightly into 32-bit words.** `pack` keeps an accumulator `current` and a counter `used` (bits filled in the current word). Each value is first reduced to its low `w` bits via `u = v & mask` — this is exactly the two's-complement encoding for negatives. It then streams those `w` bits out: `take = min(remaining, 32 - used)` bits are OR'd in at offset `used` (`current |= part << used`), and when `used` hits 32 the word is flushed (`& 0xFFFFFFFF`) and the accumulator resets. A value that crosses a word boundary simply continues in the next word, matching the canonical "first value in the lowest bits of `words[0]`" rule. Any partial final word is flushed at the end.
This is correct because masking gives canonical signed encodings, and the little-endian bit cursor preserves order and boundary-crossing exactly as specified.
Time complexity: O(n). Space complexity: O(n).
Hints
- A signed w-bit two's-complement integer must lie in the range [-2^(w-1), 2^(w-1)-1].
- While packing, track the current 32-bit word and the bit offset already used inside it.
Part 2: Encode an integer stream into RLE and bit-packed blocks
Implement the **encoder** for a simple integer codec that compresses a sequence of signed integers into a list of *blocks*.
Implement the function:
```python
def solution(values, rle_min_run=3, max_bp_block=128):
```
- `values`: a list of signed 32-bit integers (may be empty).
- `rle_min_run`: minimum run length that qualifies for run-length encoding.
- `max_bp_block`: maximum number of values per bit-packed block.
Return the full list of encoded blocks (in order). For an empty input, return an empty list.
## Block formats
There are two kinds of blocks, each a dictionary:
- **RLE block** — `{'type': 'R', 'value': value, 'count': count}`
- **Bit-packed block** — `{'type': 'B', 'bitWidth': w, 'count': n, 'words': words}`
## Encoding rules
1. **Scan in order.** Process `values` left to right in a single pass.
2. **Detect maximal runs.** A *run* is a maximal stretch of consecutive **equal** values. For a run of length `L`:
- If `L >= rle_min_run`, emit exactly **one** RLE block `{'type': 'R', 'value': <the repeated value>, 'count': L}` for that run.
- Otherwise (`L < rle_min_run`), the values of that run are **non-RLE** and must be carried into bit-packed blocks.
3. **Order is preserved.** Non-RLE values are emitted in their original order. Whenever an RLE block is produced, all pending non-RLE values accumulated so far must be flushed (as bit-packed blocks) **before** that RLE block is appended.
4. **Group non-RLE values into bit-packed blocks.** Treat the running stream of non-RLE values as one continuous sequence (it may span across multiple short runs and across different values). Cut it into chunks of **at most `max_bp_block`** values each, in order; each chunk becomes one bit-packed block. A trailing chunk smaller than `max_bp_block` is allowed at a flush boundary (i.e. before an RLE block or at the end of input).
5. **Choose the bit width.** Each bit-packed block uses the **minimum signed two's-complement bit width** needed to represent **every** value in that block. Concretely, the block's `bitWidth` `w` is the maximum over its values of the minimum signed width of each value, where:
- `0` needs **1** bit,
- a positive value `x` needs `x.bit_length() + 1` bits (an extra bit for the sign),
- a negative value `x` needs `(~x).bit_length() + 1` bits.
(So `-1` → 1, `7` → 4, `2147483647` → 32, and `-2147483648` → 32.)
6. **Pack values tightly.** Within a bit-packed block, each value is reduced to its low `w` bits via its two's-complement representation (`value & ((1 << w) - 1)`) and packed **tightly, LSB-first**, into a list of **unsigned 32-bit** words (`words`):
- The first value occupies the lowest `w` bits of `words[0]`.
- The next value follows **immediately** in the next `w` bits, and so on.
- When a value straddles a 32-bit word boundary, its low bits go into the current word and the remaining high bits continue at the start of the next word.
- After all values are written, a final partial word (if any) is appended. Every entry in `words` is an unsigned 32-bit integer (`0 .. 2^32 - 1`).
`count` is the number of values packed into the block (`n`).
Return the complete ordered list of `R` and `B` blocks.
## Examples
- `solution([], 3, 128)` → `[]`
- `solution([1, 2, 3], 3, 128)` → `[{'type': 'B', 'bitWidth': 3, 'count': 3, 'words': [209]}]`
- `solution([7, 7, 7, 7], 3, 128)` → `[{'type': 'R', 'value': 7, 'count': 4}]`
- `solution([5, 5, 5, 1, 2, 2, 3, 3, 3, 4], 3, 128)` →
`[{'type': 'R', 'value': 5, 'count': 3}, {'type': 'B', 'bitWidth': 3, 'count': 3, 'words': [145]}, {'type': 'R', 'value': 3, 'count': 3}, {'type': 'B', 'bitWidth': 4, 'count': 1, 'words': [4]}]`
- `solution([0, 0, 1, 1, 2, 2], 3, 4)` →
`[{'type': 'B', 'bitWidth': 2, 'count': 4, 'words': [80]}, {'type': 'B', 'bitWidth': 3, 'count': 2, 'words': [18]}]`
(No run reaches length 3, so all six values are non-RLE. They form one continuous stream that is cut into chunks of at most `max_bp_block = 4`: the first 4 values `[0, 0, 1, 1]` and then `[2, 2]`.)
## Constraints
- `0 <= len(values) <= 10^5`
- `1 <= rle_min_run <= 10`
- `1 <= max_bp_block <= 128`
- All input values are in the signed 32-bit range `[-2147483648, 2147483647]`.
Constraints
- 0 <= len(values) <= 10^5
- 1 <= rle_min_run <= 10
- 1 <= max_bp_block <= 128
- All input values are in the signed 32-bit range [-2147483648, 2147483647].
Examples
Input: ([], 3, 128)
Expected Output: []
Explanation: Edge case: empty input produces no blocks.
Input: ([1, 2, 3], 3, 128)
Expected Output: [{'type': 'B', 'bitWidth': 3, 'count': 3, 'words': [209]}]
Explanation: No run is long enough for RLE, so the whole sequence becomes one BP block.
Input: ([7, 7, 7, 7], 3, 128)
Expected Output: [{'type': 'R', 'value': 7, 'count': 4}]
Explanation: A maximal run of length 4 qualifies for RLE.
Input: ([5, 5, 5, 1, 2, 2, 3, 3, 3, 4], 3, 128)
Expected Output: [{'type': 'R', 'value': 5, 'count': 3}, {'type': 'B', 'bitWidth': 3, 'count': 3, 'words': [145]}, {'type': 'R', 'value': 3, 'count': 3}, {'type': 'B', 'bitWidth': 4, 'count': 1, 'words': [4]}]
Explanation: The 5s and 3s become RLE blocks; the short middle segment and final 4 become BP blocks.
Input: ([0, 0, 1, 1, 2, 2], 3, 4)
Expected Output: [{'type': 'B', 'bitWidth': 2, 'count': 4, 'words': [80]}, {'type': 'B', 'bitWidth': 3, 'count': 2, 'words': [18]}]
Explanation: No run reaches length 3, so everything is BP. max_bp_block = 4 forces the data to split into two BP blocks.
Solution
def solution(values, rle_min_run=3, max_bp_block=128):
from collections import deque
def min_bits_signed(x):
if x >= 0:
return 1 if x == 0 else x.bit_length() + 1
return (~x).bit_length() + 1
def pack(vals, w):
words = []
current = 0
used = 0
mask = (1 << w) - 1
for v in vals:
u = v & mask
remaining = w
while remaining > 0:
space = 32 - used
take = remaining if remaining < space else space
part = u & ((1 << take) - 1)
current |= part << used
used += take
u >>= take
remaining -= take
if used == 32:
words.append(current & 0xFFFFFFFF)
current = 0
used = 0
if used > 0:
words.append(current & 0xFFFFFFFF)
return words
def make_bp(chunk):
bit_width = max(min_bits_signed(v) for v in chunk)
return {'type': 'B', 'bitWidth': bit_width, 'count': len(chunk), 'words': pack(chunk, bit_width)}
blocks = []
buffer = deque()
def flush_full_chunks():
while len(buffer) >= max_bp_block:
chunk = [buffer.popleft() for _ in range(max_bp_block)]
blocks.append(make_bp(chunk))
def flush_all():
while buffer:
size = min(len(buffer), max_bp_block)
chunk = [buffer.popleft() for _ in range(size)]
blocks.append(make_bp(chunk))
i = 0
n = len(values)
while i < n:
j = i + 1
while j < n and values[j] == values[i]:
j += 1
run_len = j - i
if run_len >= rle_min_run:
flush_all()
blocks.append({'type': 'R', 'value': values[i], 'count': run_len})
else:
for k in range(i, j):
buffer.append(values[k])
flush_full_chunks()
i = j
flush_all()
return blocksExplanation
The encoder makes a **single left-to-right pass**, greedily separating long equal-value runs (RLE) from everything else (bit-packed), while preserving original order.
**Run detection.** The outer `while` finds each maximal run: from index `i`, it advances `j` while `values[j] == values[i]`, giving `run_len = j - i`. Each element is visited once because `i` jumps straight to `j`.
**Dispatch per run.**
- If `run_len >= rle_min_run`, the run is RLE-worthy. We first `flush_all()` the pending bit-pack `buffer` (order must be preserved), then append `{'type':'R', 'value', 'count':run_len}`.
- Otherwise the (short) run's values are pushed into a `deque` `buffer`, and `flush_full_chunks()` emits any complete `max_bp_block`-sized chunk as a 'B' block.
**Building a 'B' block** (`make_bp`): `bitWidth` is the **max** of `min_bits_signed(v)` over the chunk — the minimum signed two's-complement width. `min_bits_signed` returns 1 for `0`, `bit_length()+1` for positives (room for the sign bit), and `(~x).bit_length()+1` for negatives — so `-1`→1, `7`→4, `±2^31`→32.
**Tight packing** (`pack`): each value's low `w` bits (`v & mask`, two's-complement) are written LSB-first into a rolling 32-bit `current`. When a value straddles a word boundary, the inner loop splits it: it writes `take` bits, flushes `current` to `words` at 32 bits, and continues with the remainder. A final partial word is flushed at the end.
After the loop, a closing `flush_all()` drains the remaining buffer. The greedy choice is correct because RLE eligibility depends only on a run's own length, and non-RLE values are emitted in order in fixed-size chunks.
Time complexity: O(n). Space complexity: O(max_bp_block + output_size).
Hints
- Think in terms of maximal runs first. A long enough run becomes one R block; a short run is buffered into BP data.
- Flush the buffered non-RLE values before emitting an R block, and also whenever the BP buffer reaches max_bp_block.
Part 3: Decode RLE and bit-packed blocks
Decompress a sequence of encoded blocks back into the original list of integers.
Implement:
```python
def solution(blocks):
...
```
## Input
`blocks` is a list of block objects, processed **in order**. There are two block types, each given as a dictionary.
**RLE (run-length) block** — `type == 'R'`:
| Key | Meaning |
|-----|---------|
| `'type'` | the string `'R'` |
| `'value'` | the integer that repeats |
| `'count'` | how many times `value` repeats (`count >= 0`) |
It decodes to `count` copies of `value`.
**Bit-packed block** — `type == 'B'`:
| Key | Meaning |
|-----|---------|
| `'type'` | the string `'B'` |
| `'bitWidth'` | `w`, the number of bits per value |
| `'count'` | `n`, how many values this block encodes (`n >= 0`) |
| `'words'` | a list of **unsigned 32-bit** integers holding the packed bits |
## Output
Return a single flat list of integers: the decoded values of every block, **concatenated in the order the blocks appear**. An empty `blocks` list returns an empty list.
## Decoding a bit-packed block
The `n` values are packed **tightly** (no padding) into the `words` stream, **least-significant bit first**:
- The **first** value occupies the lowest `w` bits of `words[0]`.
- Each subsequent value starts in the bit position **immediately after** the previous one.
- Within a word, lower bit positions hold earlier values. When a value would extend past bit 31 of the current word, it **continues into the next word**: the low bits come from the top of the current word and the remaining high bits come from the bottom of the next word.
Each extracted `w`-bit pattern is a **signed two's-complement** integer and must be **sign-extended** to an ordinary Python `int`:
- If the pattern's top bit (bit `w - 1`) is set, the value is negative — subtract `2^w` from the raw pattern.
- Otherwise the raw pattern is the value as-is.
A bit-packed block with `count == 0` contributes no values (the `'words'` list is present per the format above but no bits are read from it).
## Examples
- `[]` → `[]`
- `[{'type': 'R', 'value': 5, 'count': 4}]` → `[5, 5, 5, 5]`
- `[{'type': 'B', 'bitWidth': 3, 'count': 3, 'words': [209]}]` → `[1, 2, 3]`
(`209` is `0b11010001`; reading 3-bit fields from the lowest bits gives `001`, `010`, `011`.)
- `[{'type': 'B', 'bitWidth': 32, 'count': 2, 'words': [2147483647, 2147483648]}, {'type': 'R', 'value': -1, 'count': 2}]` → `[2147483647, -2147483648, -1, -1]`
(the unsigned word `2147483648` sign-extends to `-2147483648`.)
- `[{'type': 'B', 'bitWidth': 5, 'count': 7, 'words': [4294967295, 7]}]` → `[-1, -1, -1, -1, -1, -1, -1]`
(each 5-bit field is all ones, which sign-extends to `-1`; the last field straddles the word boundary.)
## Constraints
- `0 <= len(blocks) <= 10^5`
- Each block is valid and follows the canonical format described above.
- For bit-packed blocks, `1 <= bitWidth <= 32` when `count > 0`.
- All decoded values fit in the signed 32-bit range.
Constraints
- 0 <= len(blocks) <= 10^5
- Each block is valid and follows the canonical format.
- For bit-packed blocks, 1 <= bitWidth <= 32 when count > 0.
- All decoded values fit in the signed 32-bit range.
Examples
Input: []
Expected Output: []
Explanation: Edge case: no blocks means no output values.
Input: [{'type': 'R', 'value': 5, 'count': 4}]
Expected Output: [5, 5, 5, 5]
Explanation: An RLE block expands into repeated values.
Input: [{'type': 'B', 'bitWidth': 3, 'count': 3, 'words': [209]}]
Expected Output: [1, 2, 3]
Explanation: The 3-bit values are packed as 001, 010, 011.
Input: [{'type': 'B', 'bitWidth': 32, 'count': 2, 'words': [2147483647, 2147483648]}, {'type': 'R', 'value': -1, 'count': 2}]
Expected Output: [2147483647, -2147483648, -1, -1]
Explanation: This checks both 32-bit signed decoding and iteration across different block types.
Input: [{'type': 'B', 'bitWidth': 5, 'count': 7, 'words': [4294967295, 7]}]
Expected Output: [-1, -1, -1, -1, -1, -1, -1]
Explanation: Seven 5-bit values cross the 32-bit boundary; every 5-bit pattern is 11111, which is -1.
Solution
def solution(blocks):
result = []
for block in blocks:
if block['type'] == 'R':
result.extend([block['value']] * block['count'])
continue
w = block['bitWidth']
count = block['count']
words = block['words']
if count == 0:
continue
mask = (1 << w) - 1
bit_pos = 0
for _ in range(count):
word_index = bit_pos // 32
offset = bit_pos % 32
if offset + w <= 32:
u = (words[word_index] >> offset) & mask
else:
low_bits = 32 - offset
high_bits = w - low_bits
low = (words[word_index] >> offset) & ((1 << low_bits) - 1)
high = words[word_index + 1] & ((1 << high_bits) - 1)
u = low | (high << low_bits)
sign_bit = 1 << (w - 1)
if u & sign_bit:
value = u - (1 << w)
else:
value = u
result.append(value)
bit_pos += w
return resultExplanation
The solution walks the blocks in order and appends decoded values to a single `result` list.
**RLE blocks** (`type == 'R'`) are trivial: append `count` copies of `value` with `result.extend([value] * count)`.
**Bit-packed blocks** (`type == 'B'`) are the interesting case. Values are stored back-to-back as `w`-bit fields inside a stream of unsigned 32-bit `words`, with no padding. A running cursor `bit_pos` tracks how many bits have been consumed; for each of the `count` values we compute:
- `word_index = bit_pos // 32` — which word the field starts in,
- `offset = bit_pos % 32` — the bit offset inside that word.
To extract the raw `w`-bit pattern `u`:
- If the field stays inside one word (`offset + w <= 32`), shift right by `offset` and mask with `(1 << w) - 1`.
- If it straddles a word boundary, take the top `32 - offset` bits as the **low** part, take the remaining `w - low_bits` bits from the *start* of the next word as the **high** part, and stitch them as `low | (high << low_bits)`. This works because the stream is little-endian within each word (first value uses the lowest bits).
**Sign extension:** each `u` is a two's-complement value. If its top bit (`1 << (w-1)`) is set, the true value is `u - (1 << w)`; otherwise it's `u` unchanged.
Correctness hinges on advancing `bit_pos += w` after every value, so word boundaries are handled uniformly. Note when `w == 32`, `bit_pos` is always a multiple of 32, so `offset == 0` and the single-word path with a full 32-bit mask is taken — matching the test where `2147483648` decodes to `-2147483648`.
Time complexity: O(N). Space complexity: O(N).
Hints
- The i-th packed value starts at bit position i * bitWidth inside that block's bitstream.
- After extracting the unsigned w-bit pattern, check its highest bit to decide whether it represents a negative number.
Part 4: Validate codec unit-test coverage
Audit a proposed unit-test suite for an integer codec and report which coverage scenarios the suite already exercises.
You are validating the test coverage of an integer codec. Each test case is a **sequence** of signed 32-bit integers, and the whole suite is a list of such sequences.
## What to implement
Implement:
```
solution(test_sequences, rle_min_run=3)
```
- `test_sequences` is a list of sequences; each sequence is a list of signed 32-bit integers (a sequence may be empty).
- `rle_min_run` is the run-length threshold used by the codec (defaults to `3`).
Return a dictionary with the following **boolean** keys.
## Encoding model (run analysis)
The codec encodes a sequence by splitting it into its **maximal equal-value runs** — maximal stretches of consecutive elements that are all equal.
- A run whose length is **`>= rle_min_run`** is encoded as an **RLE block**.
- A run whose length is **`< rle_min_run`** is encoded as a **bit-packed block**.
A sequence "uses only bit-packed blocks" when it is **non-empty** and **none** of its maximal equal-value runs reaches `rle_min_run`.
## Output keys
Each key is `true` if **at least one** sequence in the suite satisfies the condition (an OR across the whole suite):
- **`simple_bp`** — there is a **non-empty** sequence whose encoding uses **only bit-packed blocks** (no run reaches `rle_min_run`). Empty sequences never count.
- **`long_run_rle`** — there is a sequence containing a maximal run of length **`>= rle_min_run`** (i.e. its encoding has at least one RLE block).
- **`alternating_switch`** — there is a **single** sequence whose encoding contains **both** an RLE block **and** a bit-packed block.
- **`has_int_min`** — the value `-2147483648` appears anywhere in the suite.
- **`has_int_max`** — the value `2147483647` appears anywhere in the suite.
- **`has_negative`** — any negative value appears anywhere in the suite.
- **`empty_input`** — at least one sequence is empty.
- **`complete`** — `true` only if **all seven** of the keys above are `true`.
Each flag is independent and is set as soon as one qualifying sequence is found.
## Example
For `test_sequences = [[], [1, 2, 3], [7, 7, 7, 7], [5, 5, 5, 1, 2, 2, 3, 3, 3], [2147483647, -2147483648, -1]]` with `rle_min_run = 3`:
- `[1, 2, 3]` → three runs of length 1 each (all bit-packed) → satisfies **`simple_bp`**.
- `[7, 7, 7, 7]` → one run of length 4 (`>= 3`, an RLE block) → satisfies **`long_run_rle`**.
- `[5, 5, 5, 1, 2, 2, 3, 3, 3]` → a length-3 run (RLE) plus shorter runs (bit-packed) → satisfies **`alternating_switch`**.
- The last sequence supplies `has_int_max`, `has_int_min`, and `has_negative`; the empty sequence supplies `empty_input`.
All seven conditions hold, so the result is:
```
{'simple_bp': True, 'long_run_rle': True, 'alternating_switch': True,
'has_int_min': True, 'has_int_max': True, 'has_negative': True,
'empty_input': True, 'complete': True}
```
## Constraints
- `0 <= number of sequences <= 10^4`
- `0 <= total number of integers across all sequences <= 2 * 10^5`
- `1 <= rle_min_run <= 10`
- All integers are in the signed 32-bit range `[-2147483648, 2147483647]`.
Constraints
- 0 <= number of sequences <= 10^4
- 0 <= total number of integers across all sequences <= 2 * 10^5
- 1 <= rle_min_run <= 10
- All integers are in the signed 32-bit range [-2147483648, 2147483647].
Examples
Input: ([[], [1, 2, 3], [7, 7, 7, 7], [5, 5, 5, 1, 2, 2, 3, 3, 3], [2147483647, -2147483648, -1]], 3)
Expected Output: {'simple_bp': True, 'long_run_rle': True, 'alternating_switch': True, 'has_int_min': True, 'has_int_max': True, 'has_negative': True, 'empty_input': True, 'complete': True}
Explanation: This suite covers every required category.
Input: ([[1, 2], [3, 3, 3]], 3)
Expected Output: {'simple_bp': True, 'long_run_rle': True, 'alternating_switch': False, 'has_int_min': False, 'has_int_max': False, 'has_negative': False, 'empty_input': False, 'complete': False}
Explanation: The suite has BP-only and RLE-only cases, but no empty case, no boundary values, and no single sequence that switches strategies.
Input: ([[]], 3)
Expected Output: {'simple_bp': False, 'long_run_rle': False, 'alternating_switch': False, 'has_int_min': False, 'has_int_max': False, 'has_negative': False, 'empty_input': True, 'complete': False}
Explanation: Edge case: only the empty-input requirement is satisfied.
Input: ([[-1, -2], [4, 4, 4, 5, 6, 6, 6], [2147483647]], 3)
Expected Output: {'simple_bp': True, 'long_run_rle': True, 'alternating_switch': True, 'has_int_min': False, 'has_int_max': True, 'has_negative': True, 'empty_input': False, 'complete': False}
Explanation: This suite is close, but it still misses INT_MIN and an empty test.
Input: ([[], [1, 2], [3, 3, 3], [2147483647, -2147483648, -1]], 3)
Expected Output: {'simple_bp': True, 'long_run_rle': True, 'alternating_switch': False, 'has_int_min': True, 'has_int_max': True, 'has_negative': True, 'empty_input': True, 'complete': False}
Explanation: Even with all other categories present, alternating_switch is still missing because no single sequence uses both block types.
Solution
def solution(test_sequences, rle_min_run=3):
INT_MIN = -2147483648
INT_MAX = 2147483647
def analyze_sequence(seq):
has_r = False
has_b = False
i = 0
n = len(seq)
while i < n:
j = i + 1
while j < n and seq[j] == seq[i]:
j += 1
if j - i >= rle_min_run:
has_r = True
else:
has_b = True
i = j
return has_r, has_b
summary = {
'simple_bp': False,
'long_run_rle': False,
'alternating_switch': False,
'has_int_min': False,
'has_int_max': False,
'has_negative': False,
'empty_input': False,
'complete': False,
}
for seq in test_sequences:
if not seq:
summary['empty_input'] = True
if any(v == INT_MIN for v in seq):
summary['has_int_min'] = True
if any(v == INT_MAX for v in seq):
summary['has_int_max'] = True
if any(v < 0 for v in seq):
summary['has_negative'] = True
has_r, has_b = analyze_sequence(seq)
if seq and not has_r:
summary['simple_bp'] = True
if has_r:
summary['long_run_rle'] = True
if has_r and has_b:
summary['alternating_switch'] = True
summary['complete'] = all([
summary['simple_bp'],
summary['long_run_rle'],
summary['alternating_switch'],
summary['has_int_min'],
summary['has_int_max'],
summary['has_negative'],
summary['empty_input'],
])
return summaryExplanation
The function audits a test suite (a list of integer sequences) and returns seven boolean coverage flags plus a `complete` flag that is true only when all seven hold.
**Core helper — `analyze_sequence`.** It walks each sequence with a two-pointer scan over *maximal equal-value runs*: pointer `i` marks a run start, and inner pointer `j` advances while `seq[j] == seq[i]`. The run length is `j - i`. If that length reaches `rle_min_run`, the run would be encoded as an RLE block, so `has_r = True`; otherwise it's a bit-packed block, so `has_b = True`. It returns `(has_r, has_b)` for the whole sequence.
**Per-sequence flags.** For every sequence it independently sets:
- `empty_input` if the sequence is empty;
- `has_int_min` / `has_int_max` / `has_negative` via three `any(...)` scans against `-2147483648`, `2147483647`, and `< 0`.
Then it consults the run analysis:
- `simple_bp` — sequence is non-empty **and** has no RLE run (`seq and not has_r`), i.e. encodes with bit-packed blocks only;
- `long_run_rle` — at least one RLE run exists (`has_r`);
- `alternating_switch` — the sequence contains **both** an RLE and a bit-packed block (`has_r and has_b`), matching the "encoding has both block types" requirement.
**Correctness.** Each flag is a logical OR accumulated across all sequences, so it becomes true as soon as one sequence satisfies it — exactly the "at least one" semantics the spec asks for. The `simple_bp` guard explicitly excludes empty sequences (which have neither block type). Finally `complete = all([...])` ANDs the seven flags. This directly mirrors the canonical-encoding rules without actually building encodings — it only needs run lengths, which fully determine block types.
Time complexity: O(N), where N is the total number of integers across all sequences. Each integer is touched a constant number of times: once in the run-scan and once per `any(...)` pass (which short-circuit, so at most three additional linear passes per sequence).. Space complexity: O(1) extra space (excluding the fixed-size 8-key result dictionary). The run scan uses only index pointers and the `any(...)` generators allocate nothing beyond constant state..
Hints
- A sequence has an RLE block iff at least one maximal run is long enough.
- alternating_switch is stronger than having some BP-only tests and some RLE-only tests; one single sequence must produce both block types.