Convert Samples into Event Intervals
Company: Anthropic
Role: Machine Learning Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
You are given a time-ordered array `samples`, where `samples[i]` is the function name observed at integer timestamp `i`. Convert this trace into a list of events in the form `(function_name, start_time, end_time)`, where each event represents one maximal contiguous run of the same function.
Use half-open intervals `[start_time, end_time)`. For example, if `samples = ["A", "A", "B", "B", "B", "A"]`, the output should be `[("A", 0, 2), ("B", 2, 5), ("A", 5, 6)]`.
Follow-up: given an integer `k`, only treat a run as a valid event if the same function appears for at least `k` consecutive timestamps. Runs shorter than `k` should be discarded. Do not merge runs that were separated in the original trace, even if they have the same function name.
Implement the transformation and discuss the time and space complexity.
Quick Answer: This question evaluates understanding of array and sequence processing, run-length encoding concepts, and interval representation for time-ordered traces, including handling boundary conditions and filtering runs by a minimum length k.
Part 1: Convert Samples into Maximal Contiguous Event Intervals
You are given a time-ordered list `samples`, where `samples[i]` is the function name observed at integer timestamp `i`. Convert this trace into a list of events in the form `(function_name, start_time, end_time)`, where each event represents one maximal contiguous run of the same function.
Use half-open intervals `[start_time, end_time)`. That means an event starting at `s` and ending at `e` covers timestamps `s, s+1, ..., e-1`.
For example, if `samples = ["A", "A", "B", "B", "B", "A"]`, the output is `[("A", 0, 2), ("B", 2, 5), ("A", 5, 6)]`.
Two runs with the same function name should remain separate if they were separated anywhere in the original trace.
Constraints
- 0 <= len(samples) <= 200000
- Each `samples[i]` is a string
- Timestamps are the indices `0` through `len(samples) - 1`
- The output must use half-open intervals `[start_time, end_time)`
Examples
Input: (["A", "A", "B", "B", "B", "A"],)
Expected Output: [("A", 0, 2), ("B", 2, 5), ("A", 5, 6)]
Explanation: There are three maximal runs: A from 0 to 2, B from 2 to 5, and A from 5 to 6.
Input: ([],)
Expected Output: []
Explanation: Empty input produces no events.