Implement streaming RLE and bit-packed codec

Q: Implement streaming RLE and bit-packed codec

This question evaluates understanding of data compression algorithms, bit-level manipulation, two's‑complement integer representation, streaming algorithm design, and encoder/decoder state management for 32-bit signed integers.

Q: How do I approach Coding & Algorithms interview questions?

Coding & Algorithms questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master coding & algorithms interviews.

Question

You are implementing a simple compression scheme for sequences of 32‑bit signed integers. The codec should support two encoding strategies:

Run‑Length Encoding (RLE) for long runs of equal values.
Bit‑Packed Encoding (BP) for blocks of values that can be represented with a small, uniform bit‑width.

The codec is streaming: values arrive one by one, and the encoder should buffer them into blocks and choose an encoding strategy per block.

Compressed representation

Design a compressed representation made of blocks. For concreteness, define:

RLE block
- Represents a run of a single repeated integer value.
- Fields:
  - type = 'R'
  - value (int32)
  - count (int, number of repetitions)
- Only use RLE blocks when count >= RLE_MIN_RUN (e.g., RLE_MIN_RUN = 3 ). Shorter runs should not be encoded as RLE.
Bit‑packed block
- Represents a sequence of count possibly different integers, all of which fit into bitWidth bits in two's‑complement representation.
- Fields:
  - type = 'B'
  - bitWidth (1–32)
  - count (number of values)
  - A packed payload containing exactly count values, each stored using bitWidth bits.
- You may choose a maximum block size MAX_BP_BLOCK (e.g., 128 values) for simplicity.

You can decide the in‑memory representation of a bit‑packed payload (e.g., array of 32‑bit integers where bits are tightly packed), as long as the decoder can reconstruct the original sequence exactly.

Encoder

Implement an Encoder class with the following behavior:

It receives input values one at a time via a method like:
```
void add(int value)
```
Internally, the encoder may maintain a buffer of recent values to decide whether to form an RLE block or a bit‑packed block.
At appropriate times (e.g., when a block is full, when the encoding strategy should change, or when flush() is called), it should emit blocks .
Provide a method:
```
List<Block> flush()
```
that finalizes the stream, closes any open block, and returns the list of compressed blocks.

Encoding strategy constraints:

For a maximal run of the same value with length L :
- If L >= RLE_MIN_RUN , you should encode it as a single RLE block.
- Otherwise, those values should be part of a bit‑packed block.
For bit‑packed blocks:
- You may group consecutive non‑RLE values into blocks up to size MAX_BP_BLOCK .
- Choose bitWidth for a block as the minimum number of bits needed to represent all its values.
You do not need to prove global optimality of the compression, but your encoder must consistently follow the above rules.

The encoder must correctly handle:

Negative values and the full range of 32‑bit signed integers ( Integer.MIN_VALUE to Integer.MAX_VALUE ).
Transitions between RLE and bit‑packed segments in the stream.

Decoder

Implement a corresponding Decoder that reconstructs the original integer sequence from a list of blocks.

Its constructor receives the list of blocks produced by the encoder:

class Decoder implements Iterator<Integer> {
    Decoder(List<Block> blocks) { ... }
    boolean hasNext();
    int next();
}

hasNext() / next() should expose the original sequence of integers in order, exactly as they were passed to Encoder.add(...) .
The decoder must correctly iterate across both RLE and bit‑packed blocks.

Tasks

Specify the exact in‑memory structure you will use for Block and the bit‑packed payload.
Implement Encoder.add(value) and Encoder.flush() to produce a valid block sequence.
Implement Decoder as an iterator over the decompressed values.
Write unit tests for cases including:
- Simple sequences without repeats (forces bit‑packing).
- Long runs of a single value (forces RLE).
- Alternating patterns (switching between RLE and BP).
- Values near Integer.MAX_VALUE , Integer.MIN_VALUE , and negative numbers.
- Empty input sequence.

Assume you are working in an object‑oriented language (e.g., Java, C++, or similar) and focus on clean class design and correctness of the codec pair.

Implement streaming RLE and bit-packed codec

Overview

Compressed representation

Encoder

Decoder

Tasks

Comments (0)