PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Data Manipulation (SQL/Python)/Google

Find most co‑purchased product pairs in SQL

Last updated: Mar 29, 2026

Quick Overview

This question evaluates SQL data manipulation and performance skills, specifically the ability to generate unordered product pairs, deduplicate pairs per order, compute the pair_count metric, and propose indexes to optimize the query.

  • Medium
  • Google
  • Data Manipulation (SQL/Python)
  • Data Scientist

Find most co‑purchased product pairs in SQL

Company: Google

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Technical Screen

Given the schema and sample data below, write ANSI-SQL to return the top 5 unordered product pairs most frequently purchased together across distinct orders. Requirements: count each pair at most once per order; exclude self-pairs; represent pairs with the smaller product_id first; break ties by higher pair_count then lexicographically (product_id, product_id); and include pair_count. Also propose any indexes to speed this query on large data. Schema: - orders(order_id INT PRIMARY KEY, user_id INT, order_ts TIMESTAMP) - order_items(order_id INT REFERENCES orders(order_id), product_id INT REFERENCES products(product_id), qty INT) - products(product_id INT PRIMARY KEY, name TEXT) Sample tables: orders +----------+---------+---------------------+ | order_id | user_id | order_ts | +----------+---------+---------------------+ | 1 | 101 | 2025-08-30 10:05:00 | | 2 | 202 | 2025-08-30 12:30:00 | | 3 | 101 | 2025-08-31 09:00:00 | +----------+---------+---------------------+ order_items +----------+------------+-----+ | order_id | product_id | qty | +----------+------------+-----+ | 1 | 10 | 1 | | 1 | 20 | 2 | | 1 | 30 | 1 | | 2 | 10 | 1 | | 2 | 20 | 1 | | 3 | 20 | 1 | | 3 | 40 | 1 | +----------+------------+-----+ products +------------+-------------+ | product_id | name | +------------+-------------+ | 10 | 'A' | | 20 | 'B' | | 30 | 'C' | | 40 | 'D' | +------------+-------------+ Return columns: p1_id, p2_id, pair_count.

Quick Answer: This question evaluates SQL data manipulation and performance skills, specifically the ability to generate unordered product pairs, deduplicate pairs per order, compute the pair_count metric, and propose indexes to optimize the query.

Related Interview Questions

  • Generate binomial matrix and column-normalize - Google (Medium)
  • Analyze video flags and reviews with SQL - Google (Medium)
  • Write SQL/Python for messy event data - Google (Medium)
  • Add a conditional column in Python - Google (Medium)
  • Design a scalable video platform database - Google (Medium)
Google logo
Google
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Data Manipulation (SQL/Python)
9
0

Given the schema and sample data below, write ANSI-SQL to return the top 5 unordered product pairs most frequently purchased together across distinct orders. Requirements: count each pair at most once per order; exclude self-pairs; represent pairs with the smaller product_id first; break ties by higher pair_count then lexicographically (product_id, product_id); and include pair_count. Also propose any indexes to speed this query on large data.

Schema:

  • orders(order_id INT PRIMARY KEY, user_id INT, order_ts TIMESTAMP)
  • order_items(order_id INT REFERENCES orders(order_id), product_id INT REFERENCES products(product_id), qty INT)
  • products(product_id INT PRIMARY KEY, name TEXT)

Sample tables: orders +----------+---------+---------------------+ | order_id | user_id | order_ts | +----------+---------+---------------------+ | 1 | 101 | 2025-08-30 10:05:00 | | 2 | 202 | 2025-08-30 12:30:00 | | 3 | 101 | 2025-08-31 09:00:00 | +----------+---------+---------------------+

order_items +----------+------------+-----+ | order_id | product_id | qty | +----------+------------+-----+ | 1 | 10 | 1 | | 1 | 20 | 2 | | 1 | 30 | 1 | | 2 | 10 | 1 | | 2 | 20 | 1 | | 3 | 20 | 1 | | 3 | 40 | 1 | +----------+------------+-----+

products +------------+-------------+ | product_id | name | +------------+-------------+ | 10 | 'A' | | 20 | 'B' | | 30 | 'C' | | 40 | 'D' | +------------+-------------+

Return columns: p1_id, p2_id, pair_count.

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Data Manipulation (SQL/Python)•More Google•More Data Scientist•Google Data Scientist•Google Data Manipulation (SQL/Python)•Data Scientist Data Manipulation (SQL/Python)
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.