PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Coding & Algorithms/Pinterest

Find list pair with maximum overlap

Last updated: Mar 29, 2026

Quick Overview

This question evaluates proficiency with set operations and similarity metrics (Jaccard), scalable algorithm design and complexity analysis for large datasets, and the ability to handle streaming updates and deterministic tie-breaking.

  • Medium
  • Pinterest
  • Coding & Algorithms
  • Data Scientist

Find list pair with maximum overlap

Company: Pinterest

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Onsite

You are given N labeled lists of items as a Python dict mapping list_name -> iterable of strings. Example input: {'L1': ['A','B','C'], 'L2': ['A','C','D'], 'L3': ['B','C','E'], 'L4': []}. Treat each list as a set (ignore duplicates). Write a function that returns the single unordered pair of list names with the largest overlap count (size of intersection), along with that overlap count and the Jaccard similarity (|A∩B| / |A∪B|). Tie-breakers: 1) higher Jaccard, 2) lexicographically smaller pair (by list name). For the example above, the expected result is ('L1','L2', overlap=2, jaccard=0.5). Constraints: up to 200,000 lists, 5,000,000 total items; items are case-sensitive strings; memory is limited so an O(N^2) all-pairs comparison is not acceptable. 1) Describe your algorithm at a high level (e.g., an inverted index over items) and analyze time/space complexity. 2) Implement it in Python. 3) Extend to return the top-k pairs efficiently. 4) Explain how you would adapt your solution if the input is a stream of (list_name, item) updates where lists can grow over time.

Quick Answer: This question evaluates proficiency with set operations and similarity metrics (Jaccard), scalable algorithm design and complexity analysis for large datasets, and the ability to handle streaming updates and deterministic tie-breaking.

Related Interview Questions

  • Design Hierarchical Permission Checks - Pinterest (medium)
  • Implement weighted random choice - Pinterest (medium)
  • Solve five hard algorithm problems - Pinterest
  • Sample a string by real-valued scores - Pinterest (hard)
  • Find First Prefix-Matching Word - Pinterest (medium)
Pinterest logo
Pinterest
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Coding & Algorithms
3
0

You are given N labeled lists of items as a Python dict mapping list_name -> iterable of strings. Example input: {'L1': ['A','B','C'], 'L2': ['A','C','D'], 'L3': ['B','C','E'], 'L4': []}. Treat each list as a set (ignore duplicates). Write a function that returns the single unordered pair of list names with the largest overlap count (size of intersection), along with that overlap count and the Jaccard similarity (|A∩B| / |A∪B|). Tie-breakers: 1) higher Jaccard, 2) lexicographically smaller pair (by list name). For the example above, the expected result is ('L1','L2', overlap=2, jaccard=0.5). Constraints: up to 200,000 lists, 5,000,000 total items; items are case-sensitive strings; memory is limited so an O(N^2) all-pairs comparison is not acceptable. 1) Describe your algorithm at a high level (e.g., an inverted index over items) and analyze time/space complexity. 2) Implement it in Python. 3) Extend to return the top-k pairs efficiently. 4) Explain how you would adapt your solution if the input is a stream of (list_name, item) updates where lists can grow over time.

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Coding & Algorithms•More Pinterest•More Data Scientist•Pinterest Data Scientist•Pinterest Coding & Algorithms•Data Scientist Coding & Algorithms
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.