This question evaluates the ability to perform time-based deduplication, aggregation and pairwise matching using SQL (joins, grouping, and anti-joins), along with handling unordered user pairs and extracting calendar dates from timestamps, and falls under the Data Manipulation (SQL/Python) domain.
You work on a music app and want to recommend new friend connections based on listening similarity.
Assume the following schemas:
listens
user_id
(BIGINT)
song_id
(BIGINT)
listened_at
(TIMESTAMP)
friendships (undirected friendship; may be stored as one row per pair)
user_id_1
(BIGINT)
user_id_2
(BIGINT)
created_at
(TIMESTAMP)
song_id
on that
same calendar date
.
(min_user, max_user)
).
friendships
, regardless of ordering).
listened_at
as UTC when deriving the calendar
date
.
Write a SQL query to return all recommended pairs of users (user_id_a, user_id_b) such that there exists at least one day where they listened to more than 3 of the same songs on that day.
Return:
user_id_a
user_id_b
listen_date
overlap_song_count
Only include rows where overlap_song_count > 3 and the pair is not already in friendships.