How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a easy difficulty Coding & Algorithms question, commonly asked during Technical Screen rounds at Jane Street.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Jane Street during technical interviews.

Transform sparse time-code stream to dense rows

Company: Jane Street

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: easy

Interview Round: Technical Screen

## Problem You are given: - A fixed set of **M distinct codes** (strings). The **column order** of the output matrix is the **lexicographic order** of these codes. - A stream of input **batches**. Each batch is a `vector<tuple<int timestamp, string code, int value>>`. Conceptually, the ideal output is an **N × M** matrix where: - Each **row** corresponds to a timestamp (rows ordered by increasing timestamp). - Each **column** corresponds to a code (columns ordered lexicographically by code). - If a `(timestamp, code)` pair has no record, its cell value is `-1`. ### Input stream guarantee (base version) Across the entire stream (or equivalently within each batch, with batches emitted in order), records are produced in **non-decreasing timestamp order**, and for the **same timestamp**, in **lexicographic code order**. ### Task Implement a **stream transformer** that consumes the above input stream and produces an output stream of: - `pair<int timestamp, vector<int> row>` where: - `timestamp` is a timestamp present in the overall input (or in a given list of timestamps, if provided). - `row` has length **M**. - `row[i]` is the value for the `i`-th code (in lexicographic order) at that `timestamp`, or `-1` if missing. Your transformer should emit exactly one output row per timestamp (once all records for that timestamp have been consumed). ### Follow-ups 1. **Some batches have codes out of order** (i.e., within a fixed timestamp, the codes are not guaranteed to be lexicographically sorted). How would you adapt the transformer? 2. **A small fraction of records have timestamps out of order** (i.e., the global timestamp ordering is slightly violated). How would you adapt the transformer while keeping memory bounded and latency reasonable? ### Clarifications / assumptions you may choose (state them clearly) - Whether the list of all M codes is given up front, or must be discovered from the stream. - Whether multiple records for the same `(timestamp, code)` can appear (and if so, whether to take last-write-wins, sum, or treat as invalid). ## Output example (illustrative) If codes are `["A","B","C"]` and the stream contains: - (1,"A",10), (1,"C",30) - (2,"B",20) Then output rows: - (1, [10, -1, 30]) - (2, [-1, 20, -1])

Quick Answer: This question evaluates proficiency in streaming data transformation and time-series alignment, including handling sparse-to-dense mapping, ordering guarantees, and memory/latency trade-offs.

Part 1: Base transformer for ordered timestamps and ordered codes

You are given a fixed set of distinct code strings and a stream of input batches. Each record is a tuple (timestamp, code, value). Across the whole stream, timestamps are in non-decreasing order, and for the same timestamp the codes appear in lexicographic order. Build dense rows in lexicographic code-column order and return exactly one row for each timestamp that appears in the stream. Missing cells must be filled with -1. Assume the code list is given up front and each (timestamp, code) appears at most once.

Constraints

0 <= len(codes) <= 10^4 and all codes are distinct.
0 <= total number of records across all batches <= 2 * 10^5.
Every record code belongs to codes.
Timestamps are globally non-decreasing, and codes are lexicographically sorted within each timestamp.
Each (timestamp, code) appears at most once.

Examples

Input: (['C', 'A', 'B'], [[(1, 'A', 10), (1, 'C', 30)], [(2, 'B', 20)]])

Expected Output: [(1, [10, -1, 30]), (2, [-1, 20, -1])]

Explanation: The input code list is unsorted, but output columns must follow sorted order ['A', 'B', 'C'].

Input: (['A', 'B'], [[(1, 'A', 5)], [(1, 'B', 7), (2, 'A', 9)]])

Expected Output: [(1, [5, 7]), (2, [9, -1])]

Explanation: A timestamp can continue in the next batch; do not emit a row just because a batch ends.

Input: (['A', 'B'], [[], []])

Expected Output: []

Explanation: Edge case: no records at all, so no timestamps are emitted.

Input: (['Z'], [[(42, 'Z', 100)]])

Expected Output: [(42, [100])]

Explanation: Edge case: one code and one record.

Hints

You only need to keep one row in memory at a time: the current timestamp's row.
Because codes for the same timestamp are already sorted, you can walk through the sorted code list with a pointer and fill missing positions with -1.

Part 2: Dense rows when codes inside a timestamp are unsorted

You are given a fixed set of distinct code strings and a stream of input batches. Each record is a tuple (timestamp, code, value). Timestamps are still globally non-decreasing, so all records for the same timestamp remain contiguous in the stream, but the codes inside one timestamp may arrive in any order. Build dense rows in lexicographic code-column order and return one row per timestamp. Missing cells are -1. If the same (timestamp, code) appears multiple times, use last-write-wins.

Constraints

0 <= len(codes) <= 10^4 and all codes are distinct.
0 <= total number of records across all batches <= 2 * 10^5.
Every record code belongs to codes.
Timestamps are globally non-decreasing, so records for one timestamp stay contiguous.
Codes within a timestamp may appear in any order; repeated (timestamp, code) pairs use last-write-wins.

Examples

Input: (['C', 'A', 'B'], [[(1, 'C', 30), (1, 'A', 10), (1, 'B', 20)], [(2, 'B', 5), (2, 'B', 8)]])

Expected Output: [(1, [10, 20, 30]), (2, [-1, 8, -1])]

Explanation: Codes arrive out of order for timestamp 1, and timestamp 2 shows last-write-wins for duplicate code B.

Input: (['B', 'A'], [[(4, 'B', 2)], [(4, 'A', 1), (5, 'B', 3)]])

Expected Output: [(4, [1, 2]), (5, [-1, 3])]

Explanation: A single timestamp can span multiple batches even when code order is arbitrary.

Input: (['X', 'Y'], [[(7, 'Y', 11)]])

Expected Output: [(7, [-1, 11])]

Explanation: Edge case: a single timestamp with one missing code.

Input: (['A'], [])

Expected Output: []

Explanation: Edge case: no batches.

Hints

The timestamp order guarantee still lets you keep only one active row at a time.
Precompute a dictionary from code to column index so you can write directly into the correct cell even when codes arrive out of order.

Part 3: Bounded-disorder transformer for slightly out-of-order timestamps

You are given a fixed set of distinct code strings and a stream of input batches. Each record is a tuple (timestamp, code, value). Timestamps may arrive slightly out of order, but the disorder is bounded by a nonnegative integer max_lag: when a record arrives, its timestamp is always at least max_seen_so_far - max_lag. Codes inside the same timestamp may appear in any order, and repeated (timestamp, code) pairs use last-write-wins. For this problem, emission checkpoints are batch boundaries: after finishing a batch, emit every timestamp t that is guaranteed final, meaning t <= max_seen - max_lag. Also emit all remaining timestamps in order at end-of-stream. Return what is emitted after each batch, plus one extra final flush list.

Constraints

0 <= len(codes) <= 10^4 and all codes are distinct.
0 <= total number of records across all batches <= 2 * 10^5.
0 <= max_lag.
Every record code belongs to codes. Codes within a timestamp may be in any order, and repeated (timestamp, code) pairs use last-write-wins.
Bounded lateness guarantee: each arriving record satisfies timestamp >= max_seen_so_far - max_lag.

Examples

Input: (['A', 'B', 'C'], [[(3, 'A', 30)], [(1, 'B', 10), (4, 'C', 40)], [(2, 'A', 20), (5, 'B', 50)]], 2)

Expected Output: [[], [(1, [-1, 10, -1])], [(2, [20, -1, -1]), (3, [30, -1, -1])], [(4, [-1, -1, 40]), (5, [-1, 50, -1])]]

Explanation: After batch 2, timestamp 1 becomes safe. After batch 3, timestamps 2 and 3 become safe. The last list is the end-of-stream flush.

Input: (['A', 'B'], [[(1, 'A', 7)], [(2, 'B', 9)]], 0)

Expected Output: [[(1, [7, -1])], [(2, [-1, 9])], []]

Explanation: Edge case: with max_lag = 0, rows become safe immediately after their batch if the stream is ordered.

Input: (['A', 'B'], [[(2, 'A', 5)], [(1, 'B', 7), (2, 'B', 8)], [(2, 'A', 6), (3, 'A', 9)]], 1)

Expected Output: [[], [(1, [-1, 7])], [(2, [6, 8])], [(3, [9, -1])]]

Explanation: Timestamp 2 stays active long enough to receive late updates, and last-write-wins changes A from 5 to 6 before emission.

Input: (['A'], [], 3)

Expected Output: [[]]

Explanation: Edge case: no batches means only the final flush list exists, and it is empty.

Hints

Track the largest timestamp seen so far; then max_seen - max_lag acts like a watermark for what is safe to emit.
Use a dictionary for active rows and a min-heap of active timestamps so you can flush the smallest safe timestamps in order.

Transform sparse time-code stream to dense rows

Company: Jane Street

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: easy

Interview Round: Technical Screen

Part 1: Base transformer for ordered timestamps and ordered codes

Constraints

0 <= len(codes) <= 10^4 and all codes are distinct.
0 <= total number of records across all batches <= 2 * 10^5.
Every record code belongs to codes.
Timestamps are globally non-decreasing, and codes are lexicographically sorted within each timestamp.
Each (timestamp, code) appears at most once.

Examples

Input: (['C', 'A', 'B'], [[(1, 'A', 10), (1, 'C', 30)], [(2, 'B', 20)]])

Expected Output: [(1, [10, -1, 30]), (2, [-1, 20, -1])]

Explanation: The input code list is unsorted, but output columns must follow sorted order ['A', 'B', 'C'].

Input: (['A', 'B'], [[(1, 'A', 5)], [(1, 'B', 7), (2, 'A', 9)]])

Expected Output: [(1, [5, 7]), (2, [9, -1])]

Explanation: A timestamp can continue in the next batch; do not emit a row just because a batch ends.

Input: (['A', 'B'], [[], []])

Expected Output: []

Explanation: Edge case: no records at all, so no timestamps are emitted.

Input: (['Z'], [[(42, 'Z', 100)]])

Expected Output: [(42, [100])]

Explanation: Edge case: one code and one record.

Hints

You only need to keep one row in memory at a time: the current timestamp's row.
Because codes for the same timestamp are already sorted, you can walk through the sorted code list with a pointer and fill missing positions with -1.

Part 2: Dense rows when codes inside a timestamp are unsorted

Constraints

0 <= len(codes) <= 10^4 and all codes are distinct.
0 <= total number of records across all batches <= 2 * 10^5.
Every record code belongs to codes.
Timestamps are globally non-decreasing, so records for one timestamp stay contiguous.
Codes within a timestamp may appear in any order; repeated (timestamp, code) pairs use last-write-wins.

Examples

Input: (['C', 'A', 'B'], [[(1, 'C', 30), (1, 'A', 10), (1, 'B', 20)], [(2, 'B', 5), (2, 'B', 8)]])

Expected Output: [(1, [10, 20, 30]), (2, [-1, 8, -1])]

Explanation: Codes arrive out of order for timestamp 1, and timestamp 2 shows last-write-wins for duplicate code B.

Input: (['B', 'A'], [[(4, 'B', 2)], [(4, 'A', 1), (5, 'B', 3)]])

Expected Output: [(4, [1, 2]), (5, [-1, 3])]

Explanation: A single timestamp can span multiple batches even when code order is arbitrary.

Input: (['X', 'Y'], [[(7, 'Y', 11)]])

Expected Output: [(7, [-1, 11])]

Explanation: Edge case: a single timestamp with one missing code.

Input: (['A'], [])

Expected Output: []

Explanation: Edge case: no batches.

Hints

The timestamp order guarantee still lets you keep only one active row at a time.
Precompute a dictionary from code to column index so you can write directly into the correct cell even when codes arrive out of order.

Part 3: Bounded-disorder transformer for slightly out-of-order timestamps

Constraints

0 <= len(codes) <= 10^4 and all codes are distinct.
0 <= total number of records across all batches <= 2 * 10^5.
0 <= max_lag.
Every record code belongs to codes. Codes within a timestamp may be in any order, and repeated (timestamp, code) pairs use last-write-wins.
Bounded lateness guarantee: each arriving record satisfies timestamp >= max_seen_so_far - max_lag.

Examples

Input: (['A', 'B', 'C'], [[(3, 'A', 30)], [(1, 'B', 10), (4, 'C', 40)], [(2, 'A', 20), (5, 'B', 50)]], 2)

Expected Output: [[], [(1, [-1, 10, -1])], [(2, [20, -1, -1]), (3, [30, -1, -1])], [(4, [-1, -1, 40]), (5, [-1, 50, -1])]]

Explanation: After batch 2, timestamp 1 becomes safe. After batch 3, timestamps 2 and 3 become safe. The last list is the end-of-stream flush.

Input: (['A', 'B'], [[(1, 'A', 7)], [(2, 'B', 9)]], 0)

Expected Output: [[(1, [7, -1])], [(2, [-1, 9])], []]

Explanation: Edge case: with max_lag = 0, rows become safe immediately after their batch if the stream is ordered.

Input: (['A', 'B'], [[(2, 'A', 5)], [(1, 'B', 7), (2, 'B', 8)], [(2, 'A', 6), (3, 'A', 9)]], 1)

Expected Output: [[], [(1, [-1, 7])], [(2, [6, 8])], [(3, [9, -1])]]

Explanation: Timestamp 2 stays active long enough to receive late updates, and last-write-wins changes A from 5 to 6 before emission.

Input: (['A'], [], 3)

Expected Output: [[]]

Explanation: Edge case: no batches means only the final flush list exists, and it is empty.

Hints

Track the largest timestamp seen so far; then max_seen - max_lag acts like a watermark for what is safe to emit.
Use a dictionary for active rows and a min-heap of active timestamps so you can flush the smallest safe timestamps in order.

Quick Overview

Quick Overview