Implement list overlap and dense-ranked word frequencies

Q: Implement list overlap and dense-ranked word frequencies

This is a Coding & Algorithms interview question from Amazon for Data Scientist roles. View the full question and solution on PracHub.

Q: How do I approach Coding & Algorithms interview questions?

Coding & Algorithms questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master coding & algorithms interviews.

Question

Part A — Multiset overlap count Implement count_overlap(a: List[int], b: List[int]) -> int that returns the size of the multiset intersection of a and b (order doesn’t matter; duplicates count). Constraints: 0 ≤ len(a), len(b) ≤ 1e7; values fit 32-bit signed int. Time O(n + m), space O(min(n, m)). Disallow sorting if it would exceed O((n+m) log(n+m)). Edge cases: empty lists, all duplicates, heavy skew. Examples: • a=[1,1,2,3], b=[1,2,2,4] → overlap elements {1×1,2×1} so return 2 • a=[5,5,5], b=[5,5] → 2 Also implement overlap_elements(a,b) returning the actual multiset intersection as a generator without materializing more than O(min(n,m)) extra memory.

Part B — Top-K words with dense ranks Implement top_k_words(words: Iterable[str], k: int=5) -> List[Tuple[str,int,int]] that returns (word, dense_rank, freq) sorted by descending freq then lexicographically for ties. Dense ranks compress gaps: if freqs are [10,10,8,7,7,7,5], ranks are [1,1,2,3,3,3,4]. Complexity: single pass over input (streaming-friendly), O(U) memory for U unique words. Handle Unicode and case-folding option via a flag. Example input: ["a","b","b","c","c","c","d","d","d","e"] Expected output (k=5): [("c",1,3),("d",1,3),("b",2,2),("a",3,1),("e",3,1)] Follow-ups: analyze time/space; discuss alternatives for 10^9 tokens (approximate top-k via count-min sketch + heap) and stability considerations.

Implement list overlap and dense-ranked word frequencies

Comments (0)