Write SQL to compare exclusive category engagement
Company: Meta
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Onsite
You are given session-level data and must compare engagement between users who exclusively used the 'social' category versus those who exclusively used the 'game' category in a fixed window. Use the 28-day window from 2025-08-04 to 2025-09-01 inclusive. Define two mutually exclusive cohorts: 'social_only' users had ≥1 'social' session and 0 'game' sessions in the window; 'game_only' users had ≥1 'game' session and 0 'social' sessions. Exclude everyone else (including users with both categories or only 'other'). Compute, for each cohort: (1) users_count; (2) median_active_days (median over users of distinct calendar days with ≥1 session); (3) pct_users_active_10_plus_days (share of users with active_days ≥ 10); and (4) avg_minutes_per_active_day (for each user, total duration_minutes in the window divided by that user's active_days; then average across users). Return one row per cohort with columns: cohort, users_count, median_active_days, pct_users_active_10_plus_days, avg_minutes_per_active_day. Assume standard SQL (e.g., PostgreSQL). Handle data quality by ignoring sessions with NULL category or duration_minutes ≤ 0. Schema and small sample data:
Schema:
users(user_id INT, signup_date DATE)
sessions(session_id INT, user_id INT, session_start TIMESTAMP, session_end TIMESTAMP, category VARCHAR, duration_minutes INT)
Sample tables (ASCII):
users
+---------+-------------+
| user_id | signup_date |
+---------+-------------+
| 1 | 2025-07-15 |
| 2 | 2025-08-10 |
| 3 | 2025-08-20 |
| 4 | 2025-08-25 |
+---------+-------------+
sessions
+------------+---------+---------------------+---------------------+----------+-----------------+
| session_id | user_id | session_start | session_end | category | duration_minutes|
+------------+---------+---------------------+---------------------+----------+-----------------+
| 101 | 1 | 2025-08-05 10:00:00 | 2025-08-05 10:30:00 | social | 30 |
| 102 | 1 | 2025-08-06 10:00:00 | 2025-08-06 10:20:00 | social | 20 |
| 103 | 2 | 2025-08-07 09:00:00 | 2025-08-07 10:00:00 | game | 60 |
| 104 | 2 | 2025-08-08 09:00:00 | 2025-08-08 09:45:00 | game | 45 |
| 105 | 3 | 2025-08-09 12:00:00 | 2025-08-09 12:10:00 | social | 10 |
| 106 | 3 | 2025-08-13 12:00:00 | 2025-08-13 12:10:00 | game | 10 |
| 107 | 4 | 2025-08-11 14:00:00 | 2025-08-11 14:30:00 | game | 30 |
+------------+---------+---------------------+---------------------+----------+-----------------+
Write a single SQL query (CTEs allowed) that produces the required output for the given window, correctly excluding mixed-category users and handling the data quality constraints.
Quick Answer: This question evaluates SQL-based data manipulation and cohort analysis competencies, including cohort definition, session-level filtering and aggregation for metrics such as user counts, median active days, proportions, and per-user average minutes while applying data-quality constraints over a fixed date window.