Convert Samples into Event Intervals
Company: Anthropic
Role: Machine Learning Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
Quick Answer: This question evaluates understanding of array and sequence processing, run-length encoding concepts, and interval representation for time-ordered traces, including handling boundary conditions and filtering runs by a minimum length k.
Part 1: Convert Samples into Maximal Contiguous Event Intervals
Constraints
- 0 <= len(samples) <= 200000
- Each `samples[i]` is a string
- Timestamps are the indices `0` through `len(samples) - 1`
- The output must use half-open intervals `[start_time, end_time)`
Examples
Input: (["A", "A", "B", "B", "B", "A"],)
Expected Output: [("A", 0, 2), ("B", 2, 5), ("A", 5, 6)]
Explanation: There are three maximal runs: A from 0 to 2, B from 2 to 5, and A from 5 to 6.
Input: ([],)
Expected Output: []
Explanation: Empty input produces no events.
Input: (["X"],)
Expected Output: [("X", 0, 1)]
Explanation: A single sample forms one run covering exactly one timestamp.
Input: (["A", "B", "A"],)
Expected Output: [("A", 0, 1), ("B", 1, 2), ("A", 2, 3)]
Explanation: The two A runs are separated by B, so they stay as separate events.
Input: (["Q", "Q", "Q", "Q"],)
Expected Output: [("Q", 0, 4)]
Explanation: All samples belong to one contiguous run.
Hints
- Scan from left to right and keep track of where the current run started.
- Whenever the function name changes, close the previous run and start a new one.
Part 2: Convert Samples into Event Intervals with Minimum Run Length k
Constraints
- 0 <= len(samples) <= 200000
- Each `samples[i]` is a string
- 1 <= k <= 1000000000
- Do not merge separate runs, even if short runs between them are discarded
- The output must use half-open intervals `[start_time, end_time)`
Examples
Input: (["A", "A", "B", "B", "B", "A"], 2)
Expected Output: [("A", 0, 2), ("B", 2, 5)]
Explanation: The last A run has length 1, so it is discarded.
Input: (["A", "A", "B", "A", "A"], 2)
Expected Output: [("A", 0, 2), ("A", 3, 5)]
Explanation: The middle B run is discarded, but the two A runs must remain separate because they were not contiguous in the original trace.
Input: (["A", "B", "A", "A", "C", "C", "C", "A"], 2)
Expected Output: [("A", 2, 4), ("C", 4, 7)]
Explanation: Only the A run from 2 to 4 and the C run from 4 to 7 meet the minimum length.
Input: ([], 3)
Expected Output: []
Explanation: Empty input produces no events.
Input: (["Z", "Z", "Z"], 1)
Expected Output: [("Z", 0, 3)]
Explanation: With k = 1, every run is valid, so the whole array becomes one event.
Hints
- First identify each maximal contiguous run exactly as in Part 1.
- When a run ends, compute its length as `end - start` and append it only if that length is at least `k`.