Match requests and accepts into friendships in SQL
Company: Roblox
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Onsite
Given two event tables, write a single SQL query (PostgreSQL) to output unique undirected friendships with the earliest valid friendship_date. Schema and sample data: Tables: friend_requests(req_id INT, requester_id INT, requestee_id INT, request_date DATE); friend_accepts(accepter_id INT, requester_id INT, accept_date DATE). Sample friend_requests:
+--------+--------------+-------------+-------------+
| req_id | requester_id | requestee_id| request_date|
+--------+--------------+-------------+-------------+
| 1 | 101 | 102 | 2025-08-01 |
| 2 | 102 | 101 | 2025-08-02 |
| 3 | 103 | 104 | 2025-08-05 |
| 4 | 105 | 106 | 2025-08-10 |
| 5 | 105 | 106 | 2025-08-12 |
| 6 | 107 | 108 | 2025-08-15 |
+--------+--------------+-------------+-------------+
Sample friend_accepts:
+-------------+--------------+------------+
| accepter_id | requester_id | accept_date|
+-------------+--------------+------------+
| 102 | 101 | 2025-08-03 |
| 104 | 103 | 2025-08-06 |
| 106 | 105 | 2025-08-11 |
| 108 | 107 | 2025-08-14 | -- invalid (accept before request)
| 106 | 105 | 2025-08-13 |
+-------------+--------------+------------+
Rules: a) A valid acceptance must match a prior request in the opposite direction (accepter_id = requestee_id AND requester_id = requester_id) with accept_date >= request_date. b) Output one row per undirected pair (user_a, user_b) where user_a = LEAST(requester_id, requestee_id) and user_b = GREATEST(requester_id, requestee_id). c) If multiple valid accepts exist for a pair, keep the earliest accept_date. d) Ignore accepts with no prior matching request or that occur before any matching request. Return columns: user_a, user_b, friendship_date. Also explain how your query avoids double-counting when both sides sent requests, and how you would index these tables to make the query fast on 1B-row logs.
Quick Answer: This question evaluates a candidate's competency in SQL data manipulation, focusing on temporal joins, deduplication of bidirectional relationships, selecting the earliest valid event, and reasoning about event-order constraints.