PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches

Quick Overview

This question evaluates proficiency in SQL data manipulation and set-based operations, focusing on deduplication, pairwise aggregation, and similarity metric computation (Jaccard) over relational tables.

  • Medium
  • Pinterest
  • Data Manipulation (SQL/Python)
  • Data Scientist

Write SQL to compute max-overlap lists

Company: Pinterest

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Onsite

Invented schema and sample data below. Assume 'today' is 2025-09-01 and 'last 7 days' means 2025-08-26 through 2025-09-01 inclusive. Only consider lists whose created_at falls in that window. Schema: - lists(list_id INT PRIMARY KEY, owner_id INT, created_at DATE) - list_items(list_id INT, item VARCHAR) Sample tables: lists +---------+----------+------------+ | list_id | owner_id | created_at | +---------+----------+------------+ | 1 | 10 | 2025-08-30 | | 2 | 11 | 2025-08-31 | | 3 | 12 | 2025-09-01 | +---------+----------+------------+ list_items +---------+------+ | list_id | item | +---------+------+ | 1 | A | | 1 | B | | 1 | C | | 2 | A | | 2 | C | | 2 | D | | 3 | B | | 3 | C | | 3 | E | +---------+------+ Task: Write a single SQL query that returns exactly one row: (list_id_small, list_id_large, overlap_count, jaccard) for the unordered pair of lists with the maximum item overlap among eligible lists. Define overlap_count = |items(Li) ∩ items(Lj)| and jaccard = overlap_count / |items(Li) ∪ items(Lj)|. Break ties by higher jaccard, then by (list_id_small, list_id_large) ascending. For the sample data above, the expected row is (1, 2, 2, 0.5). Requirements: 1) Do not count duplicate items within a list more than once. 2) Use standard SQL constructs (CTEs/window functions allowed). 3) Explain indexes you would add to make this fast at scale. 4) Provide a variant that returns the top-5 pairs.

Quick Answer: This question evaluates proficiency in SQL data manipulation and set-based operations, focusing on deduplication, pairwise aggregation, and similarity metric computation (Jaccard) over relational tables.

Last updated: Mar 29, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Write Queries for Pinterest Engagement Tasks - Pinterest (medium)
  • Write SQL for top categories and highly active users - Pinterest (easy)
  • Compute percent of first-cancelled users who never rebook - Pinterest (medium)
  • Compute percent of first-cancel users who never return - Pinterest (easy)
  • Find top category by video time spent - Pinterest (Medium)