How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a Medium difficulty Coding & Algorithms question, commonly asked during Onsite rounds at Amazon.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Amazon during technical interviews.

Implement list overlap and dense-ranked word frequencies

Company: Amazon

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Onsite

Part A — Multiset overlap count Implement count_overlap(a: List[int], b: List[int]) -> int that returns the size of the multiset intersection of a and b (order doesn’t matter; duplicates count). Constraints: 0 ≤ len(a), len(b) ≤ 1e7; values fit 32-bit signed int. Time O(n + m), space O(min(n, m)). Disallow sorting if it would exceed O((n+m) log(n+m)). Edge cases: empty lists, all duplicates, heavy skew. Examples: • a=[1,1,2,3], b=[1,2,2,4] → overlap elements {1×1,2×1} so return 2 • a=[5,5,5], b=[5,5] → 2 Also implement overlap_elements(a,b) returning the actual multiset intersection as a generator without materializing more than O(min(n,m)) extra memory. Part B — Top-K words with dense ranks Implement top_k_words(words: Iterable[str], k: int=5) -> List[Tuple[str,int,int]] that returns (word, dense_rank, freq) sorted by descending freq then lexicographically for ties. Dense ranks compress gaps: if freqs are [10,10,8,7,7,7,5], ranks are [1,1,2,3,3,3,4]. Complexity: single pass over input (streaming-friendly), O(U) memory for U unique words. Handle Unicode and case-folding option via a flag. Example input: ["a","b","b","c","c","c","d","d","d","e"] Expected output (k=5): [("c",1,3),("d",1,3),("b",2,2),("a",3,1),("e",3,1)] Follow-ups: analyze time/space; discuss alternatives for 10^9 tokens (approximate top-k via count-min sketch + heap) and stability considerations.

Quick Answer: This question evaluates a data scientist's ability to implement efficient multiset intersection and streaming top-k word-frequency computations, covering duplicate-aware counting, dense-ranking, Unicode handling, and strict time/space constraints.

Multiset Overlap Count

Return the size of the multiset intersection, or the sorted overlap elements when requested.

Constraints

Values fit in 32-bit signed integers

Examples

Input: ([1, 1, 2, 3], [1, 2, 2, 4], False)

Expected Output: 2

Explanation: Counts duplicate-limited overlap.

Input: ([5, 5, 5], [5, 5], False)

Expected Output: 2

Explanation: All duplicates.

Input: ([], [1], False)

Expected Output: 0

Explanation: Empty input.

Input: ([3, 1, 1], [1, 1, 1, 3], True)

Expected Output: [1, 1, 3]

Explanation: Return actual overlap elements.

Hints

Count the smaller list, then consume counts while scanning the larger list.

Top-K Words with Dense Ranks

Return (word, dense_rank, freq) sorted by descending frequency then lexicographically.

Constraints

Dense ranks compress frequency gaps

Examples

Input: (['a', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'd', 'e'], 5, False)

Expected Output: [('c', 1, 3), ('d', 1, 3), ('b', 2, 2), ('a', 3, 1), ('e', 3, 1)]

Explanation: Prompt example.

Input: (['Apple', 'apple', 'banana'], 2, True)

Expected Output: [('apple', 1, 2), ('banana', 2, 1)]

Explanation: Case-folding option.

Input: ([], 5, False)

Expected Output: []

Explanation: No words.

Hints

Sort unique words by (-freq, word), then increment rank only when frequency changes.

Quick Overview

This question evaluates a data scientist's ability to implement efficient multiset intersection and streaming top-k word-frequency computations, covering duplicate-aware counting, dense-ranking, Unicode handling, and strict time/space constraints.