Convert stack samples to execution trace
Company: Anthropic
Role: Machine Learning Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
Quick Answer: This question evaluates understanding of sampled stack traces, event reconstruction, handling ambiguous or incomplete profiling data, and algorithmic reasoning about correctness and time/space complexity.
Constraints
- 0 <= len(samples) <= 200000
- 0 <= len(stack_i) <= 100000, and the sum of all stack lengths is at most 400000
- Timestamps must be non-decreasing; equal timestamps are allowed
- Each stack element must be a string; malformed input should raise ValueError
Examples
Input: [(0.0, ['a', 'b', 'a', 'c']), (1.0, ['a', 'a', 'b', 'c'])]
Expected Output: [(0.0, 's', 'a'), (0.0, 's', 'b'), (0.0, 's', 'a'), (0.0, 's', 'c'), (1.0, 'e', 'c'), (1.0, 'e', 'a'), (1.0, 'e', 'b'), (1.0, 's', 'a'), (1.0, 's', 'b'), (1.0, 's', 'c'), (1.0, 'e', 'c'), (1.0, 'e', 'b'), (1.0, 'e', 'a'), (1.0, 'e', 'a')]
Explanation: The common prefix is only the first 'a'. So at t=1.0, end c, a, b; then start a, b, c. After the last sample, close the remaining stack c, b, a, a at t=1.0.
Input: []
Expected Output: []
Explanation: No samples means no events.
Input: [(0.0, []), (1.0, ['x']), (2.0, [])]
Expected Output: [(1.0, 's', 'x'), (2.0, 'e', 'x')]
Explanation: The initial empty stack emits nothing. At t=1.0, x starts. At t=2.0, x ends. The final stack is empty, so there is nothing left to close.
Input: [(0.0, ['main']), (0.0, ['main', 'worker']), (0.0, ['main'])]
Expected Output: [(0.0, 's', 'main'), (0.0, 's', 'worker'), (0.0, 'e', 'worker'), (0.0, 'e', 'main')]
Explanation: Equal timestamps are allowed. Process transitions in the given order: start main, then start worker, then end worker, then finalize by ending main.
Input: [(0.0, ['a']), (1.0, ['a']), (2.0, ['a', 'b'])]
Expected Output: [(0.0, 's', 'a'), (2.0, 's', 'b'), (2.0, 'e', 'b'), (2.0, 'e', 'a')]
Explanation: The middle sample is identical to the first, so that transition emits nothing. At t=2.0, b starts. Then the capture ends, so b and a are closed at t=2.0.
Input: [(5.0, ['root', 'f', 'f'])]
Expected Output: [(5.0, 's', 'root'), (5.0, 's', 'f'), (5.0, 's', 'f'), (5.0, 'e', 'f'), (5.0, 'e', 'f'), (5.0, 'e', 'root')]
Explanation: Duplicate names at different depths are separate frames. The single sample opens the whole stack, and finalization closes it deepest-first at the same timestamp.
Solution
def solution(samples):
def parse_sample(sample):
if not isinstance(sample, (list, tuple)) or len(sample) != 2:
raise ValueError('Each sample must be a 2-item sequence: (timestamp, stack)')
t, stack = sample
if isinstance(t, bool) or not isinstance(t, (int, float)):
raise ValueError('Timestamp must be a number')
if not isinstance(stack, list):
raise ValueError('Stack must be a list of strings')
for name in stack:
if not isinstance(name, str):
raise ValueError('Each frame name must be a string')
return float(t), stack
if not isinstance(samples, (list, tuple)):
raise ValueError('samples must be a list or tuple')
events = []
if not samples:
return events
prev_t, prev_stack = parse_sample(samples[0])
for name in prev_stack:
events.append((prev_t, 's', name))
for i in range(1, len(samples)):
curr_t, curr_stack = parse_sample(samples[i])
if curr_t < prev_t:
raise ValueError('Timestamps must be non-decreasing')
lcp = 0
limit = min(len(prev_stack), len(curr_stack))
while lcp < limit and prev_stack[lcp] == curr_stack[lcp]:
lcp += 1
for j in range(len(prev_stack) - 1, lcp - 1, -1):
events.append((curr_t, 'e', prev_stack[j]))
for j in range(lcp, len(curr_stack)):
events.append((curr_t, 's', curr_stack[j]))
prev_t, prev_stack = curr_t, curr_stack
for j in range(len(prev_stack) - 1, -1, -1):
events.append((prev_t, 'e', prev_stack[j]))
return eventsTime complexity: O(n * d), where n is the number of samples and d is the maximum stack depth. Space complexity: O(1) auxiliary space, excluding the output list.
Hints
- For two adjacent samples, the frames before the first differing depth are the only ones guaranteed to still be running.
- Emit ends before starts at each transition, and treat duplicate function names as different frames if they occur at different depths.