Find recommended friend pairs by shared listening

Q: Find recommended friend pairs by shared listening

This question evaluates the ability to perform time-based deduplication, aggregation and pairwise matching using SQL (joins, grouping, and anti-joins), along with handling unordered user pairs and extracting calendar dates from timestamps, and falls under the Data Manipulation (SQL/Python) domain.

Q: How do I approach Data Manipulation (SQL/Python) interview questions?

Data Manipulation (SQL/Python) questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master data manipulation (sql/python) interviews.

Question

Problem (SQL)

You work on a music app and want to recommend new friend connections based on listening similarity.

Tables

Assume the following schemas:

listens

user_id (BIGINT)
song_id (BIGINT)
listened_at (TIMESTAMP)

friendships (undirected friendship; may be stored as one row per pair)

user_id_1 (BIGINT)
user_id_2 (BIGINT)
created_at (TIMESTAMP)

Definitions / assumptions

Two users have an overlapping song on a given day if both listened to the same song_id on that same calendar date .
Count distinct overlapping songs per pair per day (ignore repeat listens of the same song).
Treat pairs as unordered (e.g., output (min_user, max_user) ).
A pair should be recommended only if they are not already friends (no matching pair in friendships , regardless of ordering).
Unless stated otherwise, treat listened_at as UTC when deriving the calendar date .

Task

Write a SQL query to return all recommended pairs of users (user_id_a, user_id_b) such that there exists at least one day where they listened to more than 3 of the same songs on that day.

Output

Return:

user_id_a
user_id_b
listen_date
overlap_song_count

Only include rows where overlap_song_count > 3 and the pair is not already in friendships.