Given the schema and sample data below, write ANSI-SQL to return the top 5 unordered product pairs most frequently purchased together across distinct orders. Requirements: count each pair at most once per order; exclude self-pairs; represent pairs with the smaller product_id first; break ties by higher pair_count then lexicographically (product_id, product_id); and include pair_count. Also propose any indexes to speed this query on large data. Schema: - orders(order_id INT PRIMARY KEY, user_id INT, order_ts TIMESTAMP) - order_items(order_id INT REFERENCES orders(order_id), product_id INT REFERENCES products(product_id), qty INT) - products(product_id INT PRIMARY KEY, name TEXT) Sample tables: orders +----------+---------+---------------------+ | order_id | user_id | order_ts | +----------+---------+---------------------+ | 1 | 101 | 2025-08-30 10:05:00 | | 2 | 202 | 2025-08-30 12:30:00 | | 3 | 101 | 2025-08-31 09:00:00 | +----------+---------+---------------------+ order_items +----------+------------+-----+ | order_id | product_id | qty | +----------+------------+-----+ | 1 | 10 | 1 | | 1 | 20 | 2 | | 1 | 30 | 1 | | 2 | 10 | 1 | | 2 | 20 | 1 | | 3 | 20 | 1 | | 3 | 40 | 1 | +----------+------------+-----+ products +------------+-------------+ | product_id | name | +------------+-------------+ | 10 | 'A' | | 20 | 'B' | | 30 | 'C' | | 40 | 'D' | +------------+-------------+ Return columns: p1_id, p2_id, pair_count.

This question evaluates SQL data manipulation and performance skills, specifically the ability to generate unordered product pairs, deduplicate pairs per order, compute the pair_count metric, and propose indexes to optimize the query.

How do I approach Data Manipulation (SQL/Python) interview questions?

Data Manipulation (SQL/Python) questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master data manipulation (sql/python) interviews.

What difficulty level is this interview question?

This is a Medium difficulty Data Manipulation (SQL/Python) question, commonly asked during Technical Screen rounds at Google.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Google during technical interviews.

Find most co‑purchased product pairs in SQL

Given the schema and sample data below, write ANSI-SQL to return the top 5 unordered product pairs most frequently purchased together across distinct orders. Requirements: count each pair at most once per order; exclude self-pairs; represent pairs with the smaller product_id first; break ties by higher pair_count then lexicographically (product_id, product_id); and include pair_count. Also propose any indexes to speed this query on large data.

Schema:

orders(order_id INT PRIMARY KEY, user_id INT, order_ts TIMESTAMP)
order_items(order_id INT REFERENCES orders(order_id), product_id INT REFERENCES products(product_id), qty INT)
products(product_id INT PRIMARY KEY, name TEXT)

Sample tables: orders +----------+---------+---------------------+ | order_id | user_id | order_ts | +----------+---------+---------------------+ | 1 | 101 | 2025-08-30 10:05:00 | | 2 | 202 | 2025-08-30 12:30:00 | | 3 | 101 | 2025-08-31 09:00:00 | +----------+---------+---------------------+

order_items +----------+------------+-----+ | order_id | product_id | qty | +----------+------------+-----+ | 1 | 10 | 1 | | 1 | 20 | 2 | | 1 | 30 | 1 | | 2 | 10 | 1 | | 2 | 20 | 1 | | 3 | 20 | 1 | | 3 | 40 | 1 | +----------+------------+-----+

products +------------+-------------+ | product_id | name | +------------+-------------+ | 10 | 'A' | | 20 | 'B' | | 30 | 'C' | | 40 | 'D' | +------------+-------------+

Return columns: p1_id, p2_id, pair_count.

Schema:

orders(order_id INT PRIMARY KEY, user_id INT, order_ts TIMESTAMP)
order_items(order_id INT REFERENCES orders(order_id), product_id INT REFERENCES products(product_id), qty INT)
products(product_id INT PRIMARY KEY, name TEXT)

products +------------+-------------+ | product_id | name | +------------+-------------+ | 10 | 'A' | | 20 | 'B' | | 30 | 'C' | | 40 | 'D' | +------------+-------------+

Return columns: p1_id, p2_id, pair_count.

Find most co‑purchased product pairs in SQL

Quick Overview

Find most co‑purchased product pairs in SQL

Write your answer

Find most co‑purchased product pairs in SQL

Quick Overview

Find most co‑purchased product pairs in SQL

Write your answer