Find most co‑purchased product pairs in SQL

Q: Find most co‑purchased product pairs in SQL

This is a Data Manipulation (SQL/Python) interview question from Google for Data Scientist roles. View the full question and solution on PracHub.

Q: How do I approach Data Manipulation (SQL/Python) interview questions?

Data Manipulation (SQL/Python) questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master data manipulation (sql/python) interviews.

Question

Given the schema and sample data below, write ANSI-SQL to return the top 5 unordered product pairs most frequently purchased together across distinct orders. Requirements: count each pair at most once per order; exclude self-pairs; represent pairs with the smaller product_id first; break ties by higher pair_count then lexicographically (product_id, product_id); and include pair_count. Also propose any indexes to speed this query on large data.

Schema:

orders(order_id INT PRIMARY KEY, user_id INT, order_ts TIMESTAMP)
order_items(order_id INT REFERENCES orders(order_id), product_id INT REFERENCES products(product_id), qty INT)
products(product_id INT PRIMARY KEY, name TEXT)

Sample tables: orders +----------+---------+---------------------+ | order_id | user_id | order_ts | +----------+---------+---------------------+ | 1 | 101 | 2025-08-30 10:05:00 | | 2 | 202 | 2025-08-30 12:30:00 | | 3 | 101 | 2025-08-31 09:00:00 | +----------+---------+---------------------+

order_items +----------+------------+-----+ | order_id | product_id | qty | +----------+------------+-----+ | 1 | 10 | 1 | | 1 | 20 | 2 | | 1 | 30 | 1 | | 2 | 10 | 1 | | 2 | 20 | 1 | | 3 | 20 | 1 | | 3 | 40 | 1 | +----------+------------+-----+

products +------------+-------------+ | product_id | name | +------------+-------------+ | 10 | 'A' | | 20 | 'B' | | 30 | 'C' | | 40 | 'D' | +------------+-------------+

Return columns: p1_id, p2_id, pair_count.

Find most co‑purchased product pairs in SQL

Comments (0)