How do I approach Coding & Algorithms interview questions?

Coding & Algorithms questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master coding & algorithms interviews.

What difficulty level is this interview question?

This is a Medium difficulty Coding & Algorithms question, commonly asked during Onsite rounds at Amazon.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Amazon during technical interviews.

Implement list overlap and dense-ranked word frequencies

Q: Implement list overlap and dense-ranked word frequencies

This question evaluates a data scientist's ability to implement efficient multiset intersection and streaming top-k word-frequency computations, covering duplicate-aware counting, dense-ranking, Unicode handling, and strict time/space constraints.

Amazon

Oct 13, 2025, 9:49 PM

Data Scientist

Onsite

Coding & Algorithms

3

0

Part A — Multiset overlap count Implement count_overlap(a: List[int], b: List[int]) -> int that returns the size of the multiset intersection of a and b (order doesn’t matter; duplicates count). Constraints: 0 ≤ len(a), len(b) ≤ 1e7; values fit 32-bit signed int. Time O(n + m), space O(min(n, m)). Disallow sorting if it would exceed O((n+m) log(n+m)). Edge cases: empty lists, all duplicates, heavy skew. Examples: • a=[1,1,2,3], b=[1,2,2,4] → overlap elements {1×1,2×1} so return 2 • a=[5,5,5], b=[5,5] → 2 Also implement overlap_elements(a,b) returning the actual multiset intersection as a generator without materializing more than O(min(n,m)) extra memory.

Part B — Top-K words with dense ranks Implement top_k_words(words: Iterable[str], k: int=5) -> List[Tuple[str,int,int]] that returns (word, dense_rank, freq) sorted by descending freq then lexicographically for ties. Dense ranks compress gaps: if freqs are [10,10,8,7,7,7,5], ranks are [1,1,2,3,3,3,4]. Complexity: single pass over input (streaming-friendly), O(U) memory for U unique words. Handle Unicode and case-folding option via a flag. Example input: ["a","b","b","c","c","c","d","d","d","e"] Expected output (k=5): [("c",1,3),("d",1,3),("b",2,2),("a",3,1),("e",3,1)] Follow-ups: analyze time/space; discuss alternatives for 10^9 tokens (approximate top-k via count-min sketch + heap) and stability considerations.

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Coding & Algorithms•More Amazon•More Data Scientist•Amazon Data Scientist•Amazon Coding & Algorithms•Data Scientist Coding & Algorithms

Implement list overlap and dense-ranked word frequencies

Quick Overview

Comments (0)

Implement list overlap and dense-ranked word frequencies

Quick Overview

Comments (0)