This question evaluates data manipulation skills—specifically SQL joins and anti-joins, deduplication, date/time handling, aggregation, and pairwise overlap computation—within the Data Manipulation (SQL/Python) domain at a practical implementation level requiring query-writing to produce per-day overlap counts.
You work on a music app and want to recommend “friend” connections based on listening similarity.
Assume the following tables (all timestamps are in UTC):
listens
user_id
(INT)
song_id
(INT)
listened_at
(TIMESTAMP)
friendships (undirected; each friendship appears once)
user_id_1
(INT)
user_id_2
(INT)
A pair of users (a, b) is a recommended friend pair on a given calendar date d if:
friendships
(treat friendships as undirected), and
d
, they listened to
more than 3
of the
same songs
(i.e., at least 4 distinct overlapping
song_id
s that both users listened to that day).
Write a SQL query to output all recommended pairs and the date(s) on which they qualify.
Output columns:
user_id_1
(INT) — the smaller user id in the pair
user_id_2
(INT) — the larger user id in the pair
listen_date
(DATE)
overlap_songs
(INT) — number of distinct overlapping songs on that date
Notes/assumptions: