PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates understanding of sampled stack traces, event reconstruction, handling ambiguous or incomplete profiling data, and algorithmic reasoning about correctness and time/space complexity.

  • Medium
  • Anthropic
  • Coding & Algorithms
  • Machine Learning Engineer

Convert stack samples to execution trace

Company: Anthropic

Role: Machine Learning Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

You are given sampling-profiler output: a list of Sample objects ordered by timestamp ascending. Each Sample has (t: float, stack: list[str]) where stack[0] is the outermost currently executing function and stack[-1] is the innermost. An arbitrary number of calls/returns may occur between samples. Convert this into a trace of start/end events suitable for visualization. Define precise rules to infer events between consecutive samples using the longest common prefix of stacks: emit end events for frames present in the earlier stack but not in the later one (from deepest up to the divergence point), then emit start events for frames present in the later stack but not in the earlier one (from the divergence point down to deepest). Also handle the initial sample (emit starts for its whole stack) and the finalization at the end of the last sample if required. Implement a function samples_to_trace(samples) -> list[tuple[float, str, str]] that returns events as (timestamp, 's'|'e', function_name) in the correct temporal and nesting order. Discuss how you would: 1) treat duplicate function names appearing at different depths, 2) compress consecutive identical samples, 3) handle empty stacks, invalid inputs, or equal timestamps, and 4) optionally add a synthetic root to simplify logic. Provide time and space complexity and justify correctness. Example to ground the rules: for samples [Sample(0.0, ['a','b','a','c']), Sample(1.0, ['a','a','b','c'])], the emitted sequence should reflect at t=0.0: start a, start b, start a, start c; at t=1.0: end c, end a, end b; then start a, start b, start c (or an equivalent ordering consistent with your rules).

Quick Answer: This question evaluates understanding of sampled stack traces, event reconstruction, handling ambiguous or incomplete profiling data, and algorithmic reasoning about correctness and time/space complexity.

Convert sampling-profiler stack snapshots into a flat, properly nested trace of function **start** and **end** events suitable for a flame-graph–style visualization. ## What to implement Implement `solution(samples)` (equivalent to `samples_to_trace(samples)`). - **Input** — `samples` is a list (or tuple) of snapshots ordered by timestamp ascending. Each snapshot is a 2-item pair `(t, stack)`: - `t` is a numeric timestamp (`int` or `float`). - `stack` is a list of function names (strings). `stack[0]` is the **outermost** active function and `stack[-1]` is the **innermost**. - **Output** — a list of events. Each event is a tuple `(timestamp, kind, function_name)` where: - `timestamp` is the event's time as a **float** (each input `t` is coerced to `float` in the output, so an integer input timestamp appears as a float). - `kind` is `'s'` (the function starts) or `'e'` (the function ends). ## Rules Process the samples in input order and emit events using **exactly** these rules: 1. **First sample.** Emit a start event (`'s'`) for every frame in its stack, from **outermost to innermost**, all at that sample's timestamp. 2. **Each transition.** For each consecutive pair of samples `(prev_t, prev_stack)` → `(curr_t, curr_stack)`: - Compute the **longest common prefix** length `k`: scan the two stacks position by position and stop at the first index where the frames differ (or where one stack runs out). Frames are matched **only by position**. - Emit end events (`'e'`) at `curr_t` for the frames in `prev_stack[k:]`, ordered **deepest frame first**, up to the divergence point. - Then emit start events (`'s'`) at `curr_t` for the frames in `curr_stack[k:]`, ordered from the **divergence point down to the deepest frame**. 3. **End of capture.** The capture is assumed to end at the last sample's timestamp. After all transitions, close every frame still open in the **last** stack by emitting end events, **deepest first**, all at the last sample's timestamp. ## Important details - **Duplicate names are distinct frames.** The same function name at different depths is a different frame; match frames only by position when computing the longest common prefix. - **Empty stacks are allowed** and mean no function is active at that sample. - **Identical consecutive stacks produce no events.** When two adjacent stacks are equal, `k` equals their length, so both the end and start loops are empty (an implicit compression step). - **Equal timestamps are allowed** and must be processed in the given input order. - **Empty input** (`samples` is empty) returns an empty list. ## Validation Raise `ValueError` for malformed input, specifically when: - `samples` is not a list or tuple. - A sample is not a 2-item sequence `(timestamp, stack)`. - A timestamp is not a number (note: `bool` is **not** accepted as a number). - A stack is not a `list`. - Any frame name is not a string. - Timestamps decrease (each timestamp must be **≥** the previous one; equal timestamps are fine). ## Why this works The longest common prefix is exactly the portion of the call stack still active across both samples. Everything below that prefix in the earlier sample must have ended, and everything below that prefix in the later sample must have started. Emitting ends bottom-up and starts top-down keeps every frame properly nested. > You may imagine a synthetic root frame that is always present, but do **not** emit any events for it. ## Constraints - `0 <= len(samples) <= 200000` - `0 <= len(stack_i) <= 100000`, and the sum of all stack lengths is at most `400000` - Timestamps must be non-decreasing; equal timestamps are allowed - Each stack element must be a string; malformed input must raise `ValueError`

Constraints

  • 0 <= len(samples) <= 200000
  • 0 <= len(stack_i) <= 100000, and the sum of all stack lengths is at most 400000
  • Timestamps must be non-decreasing; equal timestamps are allowed
  • Each stack element must be a string; malformed input should raise ValueError

Examples

Input: [(0.0, ['a', 'b', 'a', 'c']), (1.0, ['a', 'a', 'b', 'c'])]

Expected Output: [(0.0, 's', 'a'), (0.0, 's', 'b'), (0.0, 's', 'a'), (0.0, 's', 'c'), (1.0, 'e', 'c'), (1.0, 'e', 'a'), (1.0, 'e', 'b'), (1.0, 's', 'a'), (1.0, 's', 'b'), (1.0, 's', 'c'), (1.0, 'e', 'c'), (1.0, 'e', 'b'), (1.0, 'e', 'a'), (1.0, 'e', 'a')]

Explanation: The common prefix is only the first 'a'. So at t=1.0, end c, a, b; then start a, b, c. After the last sample, close the remaining stack c, b, a, a at t=1.0.

Input: []

Expected Output: []

Explanation: No samples means no events.

Input: [(0.0, []), (1.0, ['x']), (2.0, [])]

Expected Output: [(1.0, 's', 'x'), (2.0, 'e', 'x')]

Explanation: The initial empty stack emits nothing. At t=1.0, x starts. At t=2.0, x ends. The final stack is empty, so there is nothing left to close.

Input: [(0.0, ['main']), (0.0, ['main', 'worker']), (0.0, ['main'])]

Expected Output: [(0.0, 's', 'main'), (0.0, 's', 'worker'), (0.0, 'e', 'worker'), (0.0, 'e', 'main')]

Explanation: Equal timestamps are allowed. Process transitions in the given order: start main, then start worker, then end worker, then finalize by ending main.

Input: [(0.0, ['a']), (1.0, ['a']), (2.0, ['a', 'b'])]

Expected Output: [(0.0, 's', 'a'), (2.0, 's', 'b'), (2.0, 'e', 'b'), (2.0, 'e', 'a')]

Explanation: The middle sample is identical to the first, so that transition emits nothing. At t=2.0, b starts. Then the capture ends, so b and a are closed at t=2.0.

Input: [(5.0, ['root', 'f', 'f'])]

Expected Output: [(5.0, 's', 'root'), (5.0, 's', 'f'), (5.0, 's', 'f'), (5.0, 'e', 'f'), (5.0, 'e', 'f'), (5.0, 'e', 'root')]

Explanation: Duplicate names at different depths are separate frames. The single sample opens the whole stack, and finalization closes it deepest-first at the same timestamp.

Solution

def solution(samples):
    def parse_sample(sample):
        if not isinstance(sample, (list, tuple)) or len(sample) != 2:
            raise ValueError('Each sample must be a 2-item sequence: (timestamp, stack)')
        t, stack = sample
        if isinstance(t, bool) or not isinstance(t, (int, float)):
            raise ValueError('Timestamp must be a number')
        if not isinstance(stack, list):
            raise ValueError('Stack must be a list of strings')
        for name in stack:
            if not isinstance(name, str):
                raise ValueError('Each frame name must be a string')
        return float(t), stack

    if not isinstance(samples, (list, tuple)):
        raise ValueError('samples must be a list or tuple')

    events = []
    if not samples:
        return events

    prev_t, prev_stack = parse_sample(samples[0])

    for name in prev_stack:
        events.append((prev_t, 's', name))

    for i in range(1, len(samples)):
        curr_t, curr_stack = parse_sample(samples[i])
        if curr_t < prev_t:
            raise ValueError('Timestamps must be non-decreasing')

        lcp = 0
        limit = min(len(prev_stack), len(curr_stack))
        while lcp < limit and prev_stack[lcp] == curr_stack[lcp]:
            lcp += 1

        for j in range(len(prev_stack) - 1, lcp - 1, -1):
            events.append((curr_t, 'e', prev_stack[j]))
        for j in range(lcp, len(curr_stack)):
            events.append((curr_t, 's', curr_stack[j]))

        prev_t, prev_stack = curr_t, curr_stack

    for j in range(len(prev_stack) - 1, -1, -1):
        events.append((prev_t, 'e', prev_stack[j]))

    return events
Explanation
**Idea.** A stack sample is a snapshot of which functions are active. Between two consecutive snapshots, frames that are shared (the *longest common prefix*, matched **by position**) stayed alive; everything below the prefix in the old stack ended, and everything below the prefix in the new stack began. The algorithm just walks the samples in order and emits those transitions. **Steps in the code:** - `parse_sample` validates each `(t, stack)` pair: the timestamp must be a real number (and `bool` is explicitly rejected since `True/False` are `int`), the stack must be a `list`, and every frame must be a `str`; otherwise `ValueError`. - **First sample:** emit a `'s'` (start) event per frame, outermost→innermost. - **Each transition** `prev_stack → curr_stack`: - Compute the LCP length `lcp` by scanning positions until frames differ (so duplicate names at different depths stay distinct). - Emit `'e'` (end) events at `curr_t` for `prev_stack[lcp:]`, iterating **deepest-first** (`len-1 … lcp`). - Emit `'s'` events at `curr_t` for `curr_stack[lcp:]`, iterating **divergence-down** (`lcp … len-1`). - Identical consecutive stacks give `lcp == len`, so both loops are empty — no events, the implicit compression. - A non-decreasing timestamp check (`curr_t < prev_t → ValueError`) runs each step. - **After the loop:** close everything still open in the last stack with `'e'` events, deepest-first, at the final timestamp. **Why correct:** the LCP is exactly the still-active portion; ends are emitted bottom-up and starts top-down so every frame is properly nested, matching the required visualization order.

Time complexity: O(S), where S is the total number of stack frames across all samples (sum of stack lengths, bounded by 400000). Each transition does work proportional to len(prev_stack) + len(curr_stack) for the LCP scan plus the end/start emissions, and these sum to a linear pass over all frames. Equivalently O(n·d) for n samples of max depth d, but O(S) is tighter.. Space complexity: O(1) auxiliary space, excluding the output list (which holds O(S) events). Only a few scalar variables and references to the current/previous stack are kept..

Hints

  1. For two adjacent samples, the frames before the first differing depth are the only ones guaranteed to still be running.
  2. Emit ends before starts at each transition, and treat duplicate function names as different frames if they occur at different depths.
Last updated: May 8, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Implement a Banking System - Anthropic (medium)
  • Implement Persistent Memoization LRU Cache - Anthropic (hard)
  • Fix a Corrupted Bootloader Instruction - Anthropic (medium)
  • Implement a Time-Aware Task Manager - Anthropic (hard)
  • Implement Task Management and Duplicate Detection - Anthropic (medium)