Convert stack samples to trace events
Company: Anthropic
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
Quick Answer: This question evaluates a candidate's ability to convert chronologically ordered stack samples into start/end trace events, testing stateful stream processing, stack diffing, correct temporal ordering, and frame identity handling in the presence of recursion.
Part 1: Convert Stack Samples to Trace Events
Constraints
- 0 <= len(samples) <= 200000
- Timestamps are nondecreasing integers
- Each stack is ordered from outermost to innermost
- Total number of function names across all samples is at most 200000
- Function names are non-empty strings
Examples
Input: [(1, ['A']), (2, ['A', 'B']), (3, ['A', 'C']), (4, ['A'])]
Expected Output: [('start', 1, 'A'), ('start', 2, 'B'), ('end', 3, 'B'), ('start', 3, 'C'), ('end', 4, 'C')]
Explanation: The first sample starts A. Then B starts. At timestamp 3, B disappears before C appears. A is still active in the last sample, so it has no end event.
Input: [(10, ['main', 'foo']), (20, ['main', 'foo']), (30, ['main'])]
Expected Output: [('start', 10, 'main'), ('start', 10, 'foo'), ('end', 30, 'foo')]
Explanation: The stack is unchanged between 10 and 20, so no new events are emitted there.
Input: [(1, ['A']), (2, ['A', 'A']), (3, ['A', 'A', 'B']), (4, ['A']), (5, [])]
Expected Output: [('start', 1, 'A'), ('start', 2, 'A'), ('start', 3, 'B'), ('end', 4, 'B'), ('end', 4, 'A'), ('end', 5, 'A')]
Explanation: Recursive calls are distinct by depth. When the stack shrinks from ['A', 'A', 'B'] to ['A'], B ends before the inner recursive A.
Input: []
Expected Output: []
Explanation: No samples means no events.
Solution
def solution(samples):
events = []
prev_stack = []
for timestamp, stack in samples:
stack = list(stack)
lcp = 0
limit = min(len(prev_stack), len(stack))
while lcp < limit and prev_stack[lcp] == stack[lcp]:
lcp += 1
for i in range(len(prev_stack) - 1, lcp - 1, -1):
events.append(('end', timestamp, prev_stack[i]))
for i in range(lcp, len(stack)):
events.append(('start', timestamp, stack[i]))
prev_stack = stack
return eventsTime complexity: O(S), where S is the total number of stack entries across all samples. Space complexity: O(D), where D is the maximum stack depth, not counting the output list.
Hints
- For each pair of consecutive samples, find the longest common prefix of their stacks.
- Everything removed from the old stack after that prefix ends first, from the back toward the front.
Part 2: Emit Trace Events Only for Frames Lasting at Least N Samples
Constraints
- 0 <= len(samples) <= 200000
- 1 <= n <= 200000
- Timestamps are nondecreasing integers
- Each stack is ordered from outermost to innermost
- Total number of function names across all samples is at most 200000
- Function names are non-empty strings
Examples
Input: ([(1, ['A']), (2, ['A']), (3, ['A', 'B']), (4, ['A', 'B']), (5, ['A']), (6, [])], 2)
Expected Output: [('start', 1, 'A'), ('start', 3, 'B'), ('end', 5, 'B'), ('end', 6, 'A')]
Explanation: A is present in samples 1 and 2, so it qualifies and gets start time 1. B is first seen at 3 and confirmed at 4, so its start time is backdated to 3.
Input: ([(1, ['A']), (2, ['A', 'B']), (3, ['A']), (4, [])], 2)
Expected Output: [('start', 1, 'A'), ('end', 4, 'A')]
Explanation: B appears in only one sample, so it never reaches the threshold and produces no events.
Input: ([(1, ['A']), (2, ['A', 'A']), (3, ['A', 'A']), (4, ['A']), (5, ['A']), (6, [])], 2)
Expected Output: [('start', 1, 'A'), ('start', 2, 'A'), ('end', 4, 'A'), ('end', 6, 'A')]
Explanation: The outer A qualifies starting at 1. The recursive inner A is a different frame at a deeper depth; it qualifies starting at 2 and ends earlier.
Input: ([(10, []), (20, ['main']), (30, ['main', 'x']), (40, ['main'])], 1)
Expected Output: [('start', 20, 'main'), ('start', 30, 'x'), ('end', 40, 'x')]
Explanation: With n = 1, every observed frame is emitted immediately, so this behaves like Part 1.
Input: ([], 3)
Expected Output: []
Explanation: No samples means no events.
Solution
def solution(samples, n):
events = []
active = [] # [name, first_seen_timestamp, streak_length, started]
for timestamp, stack in samples:
stack = list(stack)
lcp = 0
limit = min(len(active), len(stack))
while lcp < limit and active[lcp][0] == stack[lcp]:
lcp += 1
for depth in range(len(active) - 1, lcp - 1, -1):
name, first_ts, streak, started = active[depth]
if started:
events.append((timestamp, 0, -depth, 'end', name))
active = active[:lcp]
for depth in range(lcp):
active[depth][2] += 1
if not active[depth][3] and active[depth][2] >= n:
active[depth][3] = True
events.append((active[depth][1], 1, depth, 'start', active[depth][0]))
for depth in range(lcp, len(stack)):
started = (n == 1)
frame = [stack[depth], timestamp, 1, started]
active.append(frame)
if started:
events.append((timestamp, 1, depth, 'start', stack[depth]))
events.sort(key=lambda e: (e[0], e[1], e[2]))
return [(kind, timestamp, name) for timestamp, _, _, kind, name in events]Time complexity: O(S + E log E), where S is the total number of stack entries across all samples and E is the number of emitted events. Space complexity: O(D + E), where D is the maximum stack depth.
Hints
- Track each active depth as a frame record: function name, first seen timestamp, current streak length, and whether its start event has already been emitted.
- When two consecutive samples diverge, only the longest common prefix continues. Everything deeper is either ending now or starting a new pending streak.