Find recommended friend pairs by shared listening
Company: Amazon
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: easy
Interview Round: Technical Screen
## Problem (SQL)
You work on a music app and want to recommend new friend connections based on listening similarity.
### Tables
Assume the following schemas:
**listens**
- `user_id` (BIGINT)
- `song_id` (BIGINT)
- `listened_at` (TIMESTAMP)
**friendships** (undirected friendship; may be stored as one row per pair)
- `user_id_1` (BIGINT)
- `user_id_2` (BIGINT)
- `created_at` (TIMESTAMP)
### Definitions / assumptions
- Two users have an *overlapping song* on a given day if **both** listened to the **same `song_id`** on that **same calendar date**.
- Count **distinct** overlapping songs per pair per day (ignore repeat listens of the same song).
- Treat pairs as **unordered** (e.g., output `(min_user, max_user)`).
- A pair should be recommended only if they are **not already friends** (no matching pair in `friendships`, regardless of ordering).
- Unless stated otherwise, treat `listened_at` as UTC when deriving the calendar `date`.
### Task
Write a SQL query to return all recommended pairs of users `(user_id_a, user_id_b)` such that there exists at least one day where they listened to **more than 3** of the **same songs** on that day.
### Output
Return:
- `user_id_a`
- `user_id_b`
- `listen_date`
- `overlap_song_count`
Only include rows where `overlap_song_count > 3` and the pair is not already in `friendships`.
Quick Answer: This question evaluates the ability to perform time-based deduplication, aggregation and pairwise matching using SQL (joins, grouping, and anti-joins), along with handling unordered user pairs and extracting calendar dates from timestamps, and falls under the Data Manipulation (SQL/Python) domain.