Find recommended friend pairs by shared songs
Company: Amazon
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: medium
Interview Round: Technical Screen
You work on a music app and want to recommend “friend” connections based on listening similarity.
Assume the following tables (all timestamps are in UTC):
**listens**
- `user_id` (INT)
- `song_id` (INT)
- `listened_at` (TIMESTAMP)
**friendships** (undirected; each friendship appears once)
- `user_id_1` (INT)
- `user_id_2` (INT)
A pair of users `(a, b)` is a **recommended friend pair** on a given calendar date `d` if:
1) They are **not already friends** in `friendships` (treat friendships as undirected), and
2) On date `d`, they listened to **more than 3** of the **same songs** (i.e., at least 4 distinct overlapping `song_id`s that both users listened to that day).
Write a SQL query to output all recommended pairs and the date(s) on which they qualify.
**Output columns**:
- `user_id_1` (INT) — the smaller user id in the pair
- `user_id_2` (INT) — the larger user id in the pair
- `listen_date` (DATE)
- `overlap_songs` (INT) — number of distinct overlapping songs on that date
Notes/assumptions:
- If a user listens to the same song multiple times in a day, it should count as **1** toward overlap for that day.
- A pair can appear on multiple dates if they qualify on multiple dates.
Quick Answer: This question evaluates data manipulation skills—specifically SQL joins and anti-joins, deduplication, date/time handling, aggregation, and pairwise overlap computation—within the Data Manipulation (SQL/Python) domain at a practical implementation level requiring query-writing to produce per-day overlap counts.