Most Frequent Call Stack from Profiler Samples
Company: Roblox
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
Quick Answer: This question evaluates a candidate's ability to design an efficient hashing and frequency-counting solution over sequences, along with careful handling of custom tie-breaking rules. It falls under coding and algorithms, testing practical application of data structures for grouping composite keys (ordered sequences) at scale, plus precise attention to edge cases and multi-part follow-up requirements.
Most Frequent Call Stack from Profiler Samples
Constraints
- 0 <= len(samples) <= 10^5
- 1 <= len(samples[i]) <= 500 for every non-empty snapshot
- Total number of frames across all snapshots is at most 10^6
- Each frame is a non-empty string of up to 64 ASCII characters (letters, digits, '_', '.', ':')
Examples
Input: ([['main','handleRequest','parseJson'],['main','handleRequest','parseJson'],['main','handleRequest'],['main','render','drawText']],)
Expected Output: ['main', 'handleRequest', 'parseJson']
Explanation: The parseJson stack occurs twice; every other stack occurs once, so it wins outright.
Input: ([['a'],['a','b'],['c','d']],)
Expected Output: ['a', 'b']
Explanation: All three stacks occur once (a tie). The deepest stacks are ['a','b'] and ['c','d'] (depth 2); between them ['a','b'] is lexicographically smaller.
Input: ([['x','y'],['x','z']],)
Expected Output: ['x', 'y']
Explanation: Both occur once and both have depth 2, so the tie is broken lexicographically: 'y' < 'z'.
Input: ([],)
Expected Output: []
Explanation: Empty input returns an empty stack.
Input: ([['solo']],)
Expected Output: ['solo']
Explanation: A single snapshot is trivially the most frequent.
Input: ([['a','b','c'],['a','b','c'],['d'],['d']],)
Expected Output: ['a', 'b', 'c']
Explanation: Both stacks occur twice (count tie); the deeper ['a','b','c'] beats ['d'].
Input: ([['b','a'],['b','a'],['a','z'],['a','z']],)
Expected Output: ['a', 'z']
Explanation: Both occur twice with depth 2; lexicographic comparison of frame 0 gives 'a' < 'b', so ['a','z'] wins.
Hints
- A whole stack (an ordered list of frames) is the key you are counting. Convert each list to a hashable tuple and tally counts with a hash map.
- You need three ranking criteria at once: highest count, then greatest depth, then lexicographically smallest frames. Encode them in a single sort key.
- In Python, `min(counts, key=lambda st: (-counts[st], -len(st), st))` returns the winner: negating count and depth turns 'maximize' into 'minimize', and the tuple itself breaks the final lexicographic tie.
Most Frequent Call Stack per Thread (Follow-up)
Constraints
- 0 <= number of (thread_id, stack) pairs <= 10^5
- 1 <= len(stack) <= 500 for every snapshot
- Total number of frames across all snapshots is at most 10^6
- thread_id is any hashable value; each frame is a non-empty ASCII string of up to 64 characters
Examples
Input: ([('t1', ['main','a']), ('t1', ['main','a']), ('t2', ['x']), ('t2', ['x','y'])],)
Expected Output: {'t1': ['main', 'a'], 't2': ['x', 'y']}
Explanation: Thread t1 sees ['main','a'] twice (clear winner). Thread t2 sees two distinct stacks each once, so the deeper ['x','y'] wins.
Input: ([],)
Expected Output: {}
Explanation: No samples means an empty mapping.
Input: ([('t', ['f'])],)
Expected Output: {'t': ['f']}
Explanation: A single tagged snapshot maps its thread to that lone stack.
Input: ([('A', ['m','p']), ('A', ['m','p']), ('A', ['m','q']), ('B', ['b','a']), ('B', ['a','z'])],)
Expected Output: {'A': ['m', 'p'], 'B': ['a', 'z']}
Explanation: Thread A: ['m','p'] occurs twice vs ['m','q'] once. Thread B: two stacks tie at count 1 and depth 2, broken lexicographically ('a' < 'b') to ['a','z'].
Input: ([(1, ['x']), (1, ['y','z']), (1, ['x'])],)
Expected Output: {1: ['x']}
Explanation: Non-string thread ids work too; thread 1 sees ['x'] twice, beating ['y','z'].
Hints
- First partition the pairs by thread id into buckets, each bucket holding just that thread's list of stacks.
- Run the exact single-thread routine from Part 1 on each bucket independently — do not share counts across threads.
- Return a dict/map from thread id to the winning stack. An empty input yields an empty map.