Write SQL to infer group-call demand
Company: Meta
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
You are given only 1:1 call logs and a user table. Use SQL to estimate latent demand for a 'Group Call' feature by detecting 10-minute 'call loops' where ≥3 distinct users are connected via overlapping or back-to-back 1:1 calls. Compute results for the last 7 days ending today (define today as 2025-09-01). Tasks: (a) Build an undirected edge view of calls (treat caller/callee symmetrically). Explain precisely when to use UNION vs UNION ALL and the pitfalls of deduplication. (b) Sessionize calls into rolling 10-minute windows and use a recursive CTE to find connected components ("loop sessions"). (c) Output, per day, the number of loop sessions, unique users in loops, and an 'unmet connectivity' metric per session = n*(n-1)/2 − observed_unique_pairs, then aggregate the metric per day. (d) Ensure calls with connected=0 are excluded from edges but may indicate failed attempts in a sensitivity variant—briefly describe how you would incorporate them. Required output columns: event_date, loop_sessions, loop_users, unmet_connectivity_edges. Schema and small sample data you can assume:
users
user_id | country | signup_date
1 | US | 2025-08-15
2 | US | 2025-08-16
3 | US | 2025-08-17
4 | US | 2025-08-18
5 | US | 2025-08-19
6 | US | 2025-08-20
calls
call_id | caller_id | callee_id | start_ts | end_ts | connected
101 | 1 | 2 | 2025-08-31 09:00:00 | 2025-08-31 09:04:00 | 1
102 | 1 | 3 | 2025-08-31 09:05:00 | 2025-08-31 09:08:00 | 1
103 | 2 | 3 | 2025-08-31 09:06:00 | 2025-08-31 09:07:00 | 0
104 | 3 | 1 | 2025-08-31 09:09:00 | 2025-08-31 09:12:00 | 1
105 | 4 | 5 | 2025-08-31 21:00:00 | 2025-08-31 21:03:00 | 1
106 | 4 | 6 | 2025-08-31 21:04:00 | 2025-08-31 21:07:00 | 1
107 | 5 | 6 | 2025-08-31 21:06:00 | 2025-08-31 21:08:00 | 1
108 | 5 | 4 | 2025-08-31 21:09:00 | 2025-08-31 21:15:00 | 0
Quick Answer: This question evaluates SQL-based data manipulation and product-analytics competencies, including temporal sessionization, graph connectivity inference for detecting overlapping 1:1 call loops, deduplication trade-offs (UNION vs UNION ALL), and computation of derived metrics to estimate latent feature demand within the Data Manipulation (SQL/Python) domain. It is commonly asked because it measures practical application of event-time windowing, connected-component reasoning and edge-case handling (e.g., excluding failed connections) for producing daily summary metrics, emphasizing hands-on SQL proficiency over purely conceptual understanding.