PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Coding & Algorithms/Amazon

Implement list overlap and dense-ranked word frequencies

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a data scientist's ability to implement efficient multiset intersection and streaming top-k word-frequency computations, covering duplicate-aware counting, dense-ranking, Unicode handling, and strict time/space constraints.

  • Medium
  • Amazon
  • Coding & Algorithms
  • Data Scientist

Implement list overlap and dense-ranked word frequencies

Company: Amazon

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Onsite

Part A — Multiset overlap count Implement count_overlap(a: List[int], b: List[int]) -> int that returns the size of the multiset intersection of a and b (order doesn’t matter; duplicates count). Constraints: 0 ≤ len(a), len(b) ≤ 1e7; values fit 32-bit signed int. Time O(n + m), space O(min(n, m)). Disallow sorting if it would exceed O((n+m) log(n+m)). Edge cases: empty lists, all duplicates, heavy skew. Examples: • a=[1,1,2,3], b=[1,2,2,4] → overlap elements {1×1,2×1} so return 2 • a=[5,5,5], b=[5,5] → 2 Also implement overlap_elements(a,b) returning the actual multiset intersection as a generator without materializing more than O(min(n,m)) extra memory. Part B — Top-K words with dense ranks Implement top_k_words(words: Iterable[str], k: int=5) -> List[Tuple[str,int,int]] that returns (word, dense_rank, freq) sorted by descending freq then lexicographically for ties. Dense ranks compress gaps: if freqs are [10,10,8,7,7,7,5], ranks are [1,1,2,3,3,3,4]. Complexity: single pass over input (streaming-friendly), O(U) memory for U unique words. Handle Unicode and case-folding option via a flag. Example input: ["a","b","b","c","c","c","d","d","d","e"] Expected output (k=5): [("c",1,3),("d",1,3),("b",2,2),("a",3,1),("e",3,1)] Follow-ups: analyze time/space; discuss alternatives for 10^9 tokens (approximate top-k via count-min sketch + heap) and stability considerations.

Quick Answer: This question evaluates a data scientist's ability to implement efficient multiset intersection and streaming top-k word-frequency computations, covering duplicate-aware counting, dense-ranking, Unicode handling, and strict time/space constraints.

Related Interview Questions

  • Find Unique Target-Sum Pairs - Amazon (easy)
  • Find Valid IP Addresses in Files - Amazon (medium)
  • Implement Optimal Bucket Batching - Amazon (hard)
  • Implement Cache and Rotate Matrix - Amazon (medium)
  • Find Longest Activatable Server Streak - Amazon (hard)
Amazon logo
Amazon
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Coding & Algorithms
3
0

Part A — Multiset overlap count Implement count_overlap(a: List[int], b: List[int]) -> int that returns the size of the multiset intersection of a and b (order doesn’t matter; duplicates count). Constraints: 0 ≤ len(a), len(b) ≤ 1e7; values fit 32-bit signed int. Time O(n + m), space O(min(n, m)). Disallow sorting if it would exceed O((n+m) log(n+m)). Edge cases: empty lists, all duplicates, heavy skew. Examples: • a=[1,1,2,3], b=[1,2,2,4] → overlap elements {1×1,2×1} so return 2 • a=[5,5,5], b=[5,5] → 2 Also implement overlap_elements(a,b) returning the actual multiset intersection as a generator without materializing more than O(min(n,m)) extra memory.

Part B — Top-K words with dense ranks Implement top_k_words(words: Iterable[str], k: int=5) -> List[Tuple[str,int,int]] that returns (word, dense_rank, freq) sorted by descending freq then lexicographically for ties. Dense ranks compress gaps: if freqs are [10,10,8,7,7,7,5], ranks are [1,1,2,3,3,3,4]. Complexity: single pass over input (streaming-friendly), O(U) memory for U unique words. Handle Unicode and case-folding option via a flag. Example input: ["a","b","b","c","c","c","d","d","d","e"] Expected output (k=5): [("c",1,3),("d",1,3),("b",2,2),("a",3,1),("e",3,1)] Follow-ups: analyze time/space; discuss alternatives for 10^9 tokens (approximate top-k via count-min sketch + heap) and stability considerations.

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Coding & Algorithms•More Amazon•More Data Scientist•Amazon Data Scientist•Amazon Coding & Algorithms•Data Scientist Coding & Algorithms
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.