PracHub
QuestionsPremiumLearningGuidesCheatsheetNEW

Quick Overview

This question evaluates understanding of sampled stack traces, event reconstruction, handling ambiguous or incomplete profiling data, and algorithmic reasoning about correctness and time/space complexity.

  • Medium
  • Anthropic
  • Coding & Algorithms
  • Machine Learning Engineer

Convert stack samples to execution trace

Company: Anthropic

Role: Machine Learning Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

You are given sampling-profiler output: a list of Sample objects ordered by timestamp ascending. Each Sample has (t: float, stack: list[str]) where stack[0] is the outermost currently executing function and stack[-1] is the innermost. An arbitrary number of calls/returns may occur between samples. Convert this into a trace of start/end events suitable for visualization. Define precise rules to infer events between consecutive samples using the longest common prefix of stacks: emit end events for frames present in the earlier stack but not in the later one (from deepest up to the divergence point), then emit start events for frames present in the later stack but not in the earlier one (from the divergence point down to deepest). Also handle the initial sample (emit starts for its whole stack) and the finalization at the end of the last sample if required. Implement a function samples_to_trace(samples) -> list[tuple[float, str, str]] that returns events as (timestamp, 's'|'e', function_name) in the correct temporal and nesting order. Discuss how you would: 1) treat duplicate function names appearing at different depths, 2) compress consecutive identical samples, 3) handle empty stacks, invalid inputs, or equal timestamps, and 4) optionally add a synthetic root to simplify logic. Provide time and space complexity and justify correctness. Example to ground the rules: for samples [Sample(0.0, ['a','b','a','c']), Sample(1.0, ['a','a','b','c'])], the emitted sequence should reflect at t=0.0: start a, start b, start a, start c; at t=1.0: end c, end a, end b; then start a, start b, start c (or an equivalent ordering consistent with your rules).

Quick Answer: This question evaluates understanding of sampled stack traces, event reconstruction, handling ambiguous or incomplete profiling data, and algorithmic reasoning about correctness and time/space complexity.

You are given sampling-profiler output as a sequence of samples ordered by timestamp ascending. For this platform, each sample is represented as a pair `(t, stack)` where `t` is a float timestamp and `stack` is a list of function names. `stack[0]` is the outermost active function and `stack[-1]` is the innermost. Implement `solution(samples)`, equivalent to `samples_to_trace(samples)`, that converts the samples into a trace of events suitable for visualization. Each event is a tuple `(timestamp, 's' | 'e', function_name)` where `'s'` means a function starts and `'e'` means a function ends. Use these exact rules: 1. For the first sample, emit start events for its entire stack from outermost to innermost. 2. For each consecutive pair of samples `(prev_t, prev_stack)` and `(curr_t, curr_stack)`, compute the longest common prefix length `k` of the two stacks. 3. Emit end events at `curr_t` for frames in `prev_stack[k:]`, from deepest frame up to the divergence point. 4. Then emit start events at `curr_t` for frames in `curr_stack[k:]`, from the divergence point down to the deepest frame. 5. Assume the capture ends at the last sample's timestamp, so after all transitions, emit end events for every frame still open in the last stack, deepest first, at the last sample's timestamp. Important details: - Duplicate function names at different depths are distinct frames. Match frames only by position while computing the longest common prefix. - Empty stacks are allowed and mean that no function is active. - Consecutive identical stacks produce no transition events. You may think of this as an implicit compression step; the reference solution simply emits nothing for such transitions. - Equal timestamps are allowed and should be processed in the given input order. - Raise `ValueError` for malformed input, non-list stacks, non-string frame names, or decreasing timestamps. - If it helps, you may imagine a synthetic root frame that is always present, but do not emit events for it. Why this works: the longest common prefix is exactly the part of the call stack that is still active in both samples. Everything below that prefix in the earlier sample must have ended, and everything below that prefix in the later sample must have started.

Constraints

  • 0 <= len(samples) <= 200000
  • 0 <= len(stack_i) <= 100000, and the sum of all stack lengths is at most 400000
  • Timestamps must be non-decreasing; equal timestamps are allowed
  • Each stack element must be a string; malformed input should raise ValueError

Examples

Input: [(0.0, ['a', 'b', 'a', 'c']), (1.0, ['a', 'a', 'b', 'c'])]

Expected Output: [(0.0, 's', 'a'), (0.0, 's', 'b'), (0.0, 's', 'a'), (0.0, 's', 'c'), (1.0, 'e', 'c'), (1.0, 'e', 'a'), (1.0, 'e', 'b'), (1.0, 's', 'a'), (1.0, 's', 'b'), (1.0, 's', 'c'), (1.0, 'e', 'c'), (1.0, 'e', 'b'), (1.0, 'e', 'a'), (1.0, 'e', 'a')]

Explanation: The common prefix is only the first 'a'. So at t=1.0, end c, a, b; then start a, b, c. After the last sample, close the remaining stack c, b, a, a at t=1.0.

Input: []

Expected Output: []

Explanation: No samples means no events.

Input: [(0.0, []), (1.0, ['x']), (2.0, [])]

Expected Output: [(1.0, 's', 'x'), (2.0, 'e', 'x')]

Explanation: The initial empty stack emits nothing. At t=1.0, x starts. At t=2.0, x ends. The final stack is empty, so there is nothing left to close.

Input: [(0.0, ['main']), (0.0, ['main', 'worker']), (0.0, ['main'])]

Expected Output: [(0.0, 's', 'main'), (0.0, 's', 'worker'), (0.0, 'e', 'worker'), (0.0, 'e', 'main')]

Explanation: Equal timestamps are allowed. Process transitions in the given order: start main, then start worker, then end worker, then finalize by ending main.

Input: [(0.0, ['a']), (1.0, ['a']), (2.0, ['a', 'b'])]

Expected Output: [(0.0, 's', 'a'), (2.0, 's', 'b'), (2.0, 'e', 'b'), (2.0, 'e', 'a')]

Explanation: The middle sample is identical to the first, so that transition emits nothing. At t=2.0, b starts. Then the capture ends, so b and a are closed at t=2.0.

Input: [(5.0, ['root', 'f', 'f'])]

Expected Output: [(5.0, 's', 'root'), (5.0, 's', 'f'), (5.0, 's', 'f'), (5.0, 'e', 'f'), (5.0, 'e', 'f'), (5.0, 'e', 'root')]

Explanation: Duplicate names at different depths are separate frames. The single sample opens the whole stack, and finalization closes it deepest-first at the same timestamp.

Solution

def solution(samples):
    def parse_sample(sample):
        if not isinstance(sample, (list, tuple)) or len(sample) != 2:
            raise ValueError('Each sample must be a 2-item sequence: (timestamp, stack)')
        t, stack = sample
        if isinstance(t, bool) or not isinstance(t, (int, float)):
            raise ValueError('Timestamp must be a number')
        if not isinstance(stack, list):
            raise ValueError('Stack must be a list of strings')
        for name in stack:
            if not isinstance(name, str):
                raise ValueError('Each frame name must be a string')
        return float(t), stack

    if not isinstance(samples, (list, tuple)):
        raise ValueError('samples must be a list or tuple')

    events = []
    if not samples:
        return events

    prev_t, prev_stack = parse_sample(samples[0])

    for name in prev_stack:
        events.append((prev_t, 's', name))

    for i in range(1, len(samples)):
        curr_t, curr_stack = parse_sample(samples[i])
        if curr_t < prev_t:
            raise ValueError('Timestamps must be non-decreasing')

        lcp = 0
        limit = min(len(prev_stack), len(curr_stack))
        while lcp < limit and prev_stack[lcp] == curr_stack[lcp]:
            lcp += 1

        for j in range(len(prev_stack) - 1, lcp - 1, -1):
            events.append((curr_t, 'e', prev_stack[j]))
        for j in range(lcp, len(curr_stack)):
            events.append((curr_t, 's', curr_stack[j]))

        prev_t, prev_stack = curr_t, curr_stack

    for j in range(len(prev_stack) - 1, -1, -1):
        events.append((prev_t, 'e', prev_stack[j]))

    return events

Time complexity: O(n * d), where n is the number of samples and d is the maximum stack depth. Space complexity: O(1) auxiliary space, excluding the output list.

Hints

  1. For two adjacent samples, the frames before the first differing depth are the only ones guaranteed to still be running.
  2. Emit ends before starts at each transition, and treat duplicate function names as different frames if they occur at different depths.
Last updated: May 8, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Convert Samples into Event Intervals - Anthropic (medium)
  • Convert State Stream to Events - Anthropic (medium)
  • Build a concurrent web crawler - Anthropic (medium)
  • Implement a Parallel Image Processor - Anthropic (medium)
  • Implement a Batch Image Processor - Anthropic (medium)