Write SQL to analyze group-call concurrency
Company: Meta
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
You are given call data and must compute group-call metrics. Schema (timestamps are UTC):
Tables:
- calls(call_id INT PRIMARY KEY, host_user_id INT, start_ts TIMESTAMP, end_ts TIMESTAMP, is_group_enabled BOOLEAN)
- call_participants(call_id INT, user_id INT, join_ts TIMESTAMP, leave_ts TIMESTAMP NULL)
- users(user_id INT PRIMARY KEY, email TEXT, country TEXT, is_test BOOLEAN)
Sample rows:
users
user_id | email | country | is_test
10 | a@alpha.com | US | false
11 | b@example.com | US | true
12 | c@alpha.com | US | false
13 | d@beta.com | CA | false
14 | e@alpha.com | US | false
15 | f@alpha.com | US | false
calls
call_id | host_user_id | start_ts | end_ts | is_group_enabled
1 | 10 | 2025-08-31 09:00:00 | 2025-08-31 09:30:00 | true
2 | 11 | 2025-09-01 10:00:00 | 2025-09-01 10:45:00 | true
3 | 10 | 2025-09-01 11:00:00 | 2025-09-01 11:07:00 | true
call_participants
call_id | user_id | join_ts | leave_ts
1 | 10 | 2025-08-31 09:00:00 | 2025-08-31 09:30:00
1 | 12 | 2025-08-31 09:02:00 | 2025-08-31 09:15:00
1 | 13 | 2025-08-31 09:04:00 | 2025-08-31 09:20:00
1 | 14 | 2025-08-31 09:05:00 | NULL
2 | 11 | 2025-09-01 10:00:00 | 2025-09-01 10:45:00
2 | 12 | 2025-09-01 10:02:00 | 2025-09-01 10:10:00
2 | 13 | 2025-09-01 10:02:00 | 2025-09-01 10:40:00
2 | 14 | 2025-09-01 10:15:00 | 2025-09-01 10:30:00
2 | 15 | 2025-09-01 10:33:00 | 2025-09-01 10:42:00
3 | 10 | 2025-09-01 11:00:00 | 2025-09-01 11:07:00
3 | 12 | 2025-09-01 11:06:00 | 2025-09-01 11:07:00
Tasks (write SQL; one query if possible, CTEs allowed):
1) Define a call’s peak concurrent participants as the maximum number of overlapping participant intervals within [start_ts, end_ts], where each participant interval is [join_ts, COALESCE(leave_ts, end_ts)]. Exclude test users (users.is_test = true) and any user whose email domain is 'example.com' from both host and participant counts. A call is a "group call" if its peak concurrency ≥ 3.
2) For each calendar day in 2025-08-26 through 2025-09-01 (inclusive; treat "today" as 2025-09-01), return: day, total calls started that day, number of group calls started that day, and the 90th percentile (P90) of peak concurrency among calls started that day. Only include calls where is_group_enabled = true.
3) Additionally, return the top 3 calls (by peak concurrency) that started on 2025-09-01 with: call_id, host_user_id, start_ts, peak_concurrency, and the first timestamp when concurrency first reached 3 within the first 10 minutes of the call (NULL if never reached).
Constraints/edge cases to handle explicitly in SQL: overlapping intervals, NULL leave_ts, hosts or participants filtered by test/email-domain rules, and ties in top-3 broken by earlier start_ts then smaller call_id. Explain your approach to computing overlaps (e.g., +1/-1 event expansion with running SUM) and to computing P90 in ANSI SQL.
Quick Answer: This question evaluates advanced SQL competency in temporal interval analysis, concurrency counting, percentile aggregation, and complex user-filtering logic, testing skills in data manipulation, analytic/window functions, and performance-aware querying.