Find most co‑purchased product pairs in SQL
Company: Google
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: medium
Interview Round: Technical Screen
Given the schema and sample data below, write ANSI-SQL to return the top 5 unordered product pairs most frequently purchased together across distinct orders. Requirements: count each pair at most once per order; exclude self-pairs; represent pairs with the smaller product_id first; break ties by higher pair_count then lexicographically (product_id, product_id); and include pair_count. Also propose any indexes to speed this query on large data.
Schema:
- orders(order_id INT PRIMARY KEY, user_id INT, order_ts TIMESTAMP)
- order_items(order_id INT REFERENCES orders(order_id), product_id INT REFERENCES products(product_id), qty INT)
- products(product_id INT PRIMARY KEY, name TEXT)
Sample tables:
orders
+----------+---------+---------------------+
| order_id | user_id | order_ts |
+----------+---------+---------------------+
| 1 | 101 | 2025-08-30 10:05:00 |
| 2 | 202 | 2025-08-30 12:30:00 |
| 3 | 101 | 2025-08-31 09:00:00 |
+----------+---------+---------------------+
order_items
+----------+------------+-----+
| order_id | product_id | qty |
+----------+------------+-----+
| 1 | 10 | 1 |
| 1 | 20 | 2 |
| 1 | 30 | 1 |
| 2 | 10 | 1 |
| 2 | 20 | 1 |
| 3 | 20 | 1 |
| 3 | 40 | 1 |
+----------+------------+-----+
products
+------------+-------------+
| product_id | name |
+------------+-------------+
| 10 | 'A' |
| 20 | 'B' |
| 30 | 'C' |
| 40 | 'D' |
+------------+-------------+
Return columns: p1_id, p2_id, pair_count.
Quick Answer: This question evaluates SQL data manipulation and performance skills, specifically the ability to generate unordered product pairs, deduplicate pairs per order, compute the pair_count metric, and propose indexes to optimize the query.