Find Top K Largest Elements
Company: Microsoft
Role: Machine Learning Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
Quick Answer: This question evaluates a candidate's skill in algorithmic selection and data-structure usage for extracting top-k elements from large arrays, emphasizing time and space complexity considerations and correct handling of duplicate values.
Part 1: Return the K Largest Elements from an Unsorted Array
Constraints
- 1 <= len(nums) <= 100000
- 1 <= k <= len(nums)
- -1000000000 <= nums[i] <= 1000000000
- Duplicate values are treated as separate occurrences.
Examples
Input: ([7, 10, 4, 3, 20, 15], 3)
Expected Output: [10, 15, 20]
Explanation: The three largest values are 20, 15, and 10; returned sorted gives [10, 15, 20].
Input: ([5, 1, 5, 3, 5, 2], 4)
Expected Output: [3, 5, 5, 5]
Explanation: All three occurrences of 5 count separately, and 3 is the next largest value.
Input: ([-10, -3, -5, -1], 2)
Expected Output: [-3, -1]
Explanation: Among negative numbers, -1 and -3 are the largest.
Input: ([42], 1)
Expected Output: [42]
Explanation: Edge case: the array has one element and k is 1.
Input: ([2, 9, 2], 3)
Expected Output: [2, 2, 9]
Explanation: When k equals N, all elements are returned sorted.
Hints
- You do not need to keep every element seen so far; only the current best k candidates matter.
- A min-heap of size k lets you quickly identify the smallest element among the current top k.
Part 2: Choose the Better Heap Strategy for Top-K Largest
Constraints
- 1 <= len(queries) <= 100000
- Each query has exactly three integers: [n, k, memory_limit].
- 1 <= k <= n <= 1000000000000000000
- 0 <= memory_limit <= 1000000000000000000
- ceil(log2(x)) should be computed exactly using integer arithmetic, not floating-point arithmetic.
Examples
Input: ([(6, 3, 3)],)
Expected Output: ['min-heap']
Explanation: The min-heap of size 3 fits, but the max-heap of size 6 does not fit in memory_limit 3.
Input: ([(1000000, 2, 1000000)],)
Expected Output: ['max-heap']
Explanation: Both strategies fit. The min-heap score is 1000000 * ceil(log2(3)) = 2000000. The max-heap score is 1000000 + 2 * ceil(log2(1000001)) = 1000040, so max-heap has the lower estimated work score.
Input: ([(1000000, 2, 10)],)
Expected Output: ['min-heap']
Explanation: The max-heap needs to store all 1000000 elements and is not feasible. The min-heap needs only 2 slots.
Input: ([(5, 3, 2)],)
Expected Output: ['impossible']
Explanation: The min-heap needs 3 slots and the max-heap needs 5 slots, but memory_limit is only 2.
Input: ([(1, 1, 1), (6, 2, 6), (10, 10, 10)],)
Expected Output: ['min-heap', 'min-heap', 'min-heap']
Explanation: This covers a single-element edge case, an equal-score case where min-heap wins the tie, and a k = n case.
Hints
- Think about how many elements each heap strategy must store: k for the min-heap approach and n for the max-heap approach.
- In Python, (x - 1).bit_length() gives ceil(log2(x)) for positive integer x.