Convert State Stream to Events
Company: Anthropic
Role: Machine Learning Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Onsite
Quick Answer: This question evaluates a candidate's ability to process sequential categorical data by identifying and summarizing consecutive runs, reason about time and space complexity, and apply length-based filtering of events.
Part 1: Convert State Stream into Events
Constraints
- 0 <= len(states) <= 100000
- Each element in `states` supports equality comparison
- Indices are 0-based
- Runs are maximal: two adjacent events must always have different values
Examples
Input: (['A', 'A', 'B', 'B', 'B', 'A'],)
Expected Output: [('A', 0, 1), ('B', 2, 4), ('A', 5, 5)]
Explanation: Three runs appear: A from 0 to 1, B from 2 to 4, and A from 5 to 5.
Input: ([],)
Expected Output: []
Explanation: An empty stream has no events.
Input: (['X'],)
Expected Output: [('X', 0, 0)]
Explanation: A single element forms one event of length 1.
Input: ([1, 2, 2, 3, 3, 3, 2],)
Expected Output: [(1, 0, 0), (2, 1, 2), (3, 3, 5), (2, 6, 6)]
Explanation: Each maximal run is converted into one tuple.
Input: (['Z', 'Z', 'Z'],)
Expected Output: [('Z', 0, 2)]
Explanation: All elements belong to one continuous run.
Hints
- Scan from left to right and keep track of where the current run started.
- Whenever the value changes, finish the previous run and start a new one.
Part 2: Convert State Stream into Events with Minimum Run Length
Constraints
- 0 <= len(states) <= 100000
- 1 <= k <= 100000
- Each element in `states` supports equality comparison
- Runs are determined from the original stream before filtering
Examples
Input: (['A', 'A', 'B', 'B', 'B', 'A'], 2)
Expected Output: [('A', 0, 1), ('B', 2, 4)]
Explanation: The final A run has length 1, so it is excluded.
Input: (['A', 'A', 'B', 'A', 'A'], 2)
Expected Output: [('A', 0, 1), ('A', 3, 4)]
Explanation: The middle B run is too short and is removed, but the two A runs are not merged because they were separated in the original stream.
Input: ([], 1)
Expected Output: []
Explanation: An empty stream produces no events.
Input: (['X'], 1)
Expected Output: [('X', 0, 0)]
Explanation: A single-element run is valid when k = 1.
Input: (['A', 'B', 'B', 'C', 'C', 'C', 'D'], 3)
Expected Output: [('C', 3, 5)]
Explanation: Only the C run has length at least 3.
Hints
- First identify each consecutive run exactly as in Part 1.
- Before appending a run, compute its length as `end_index - start_index + 1` and compare it with `k`.