Compute multi-account activity and unread percentages in SQL
Company: Meta
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
You are given two tables. Use them as the source of truth and do not assume any other data.
Table: notifications
+--------+------------+------------+-------------------+--------+
| userid | ds | event_time | notification_type | status |
+--------+------------+------------+-------------------+--------+
| 111 | 2025-08-20 | 1766270400 | direct | unread |
| 112 | 2025-08-15 | 1765845600 | group | read |
| 114 | 2025-08-05 | 1764981600 | direct | unread |
| 115 | 2025-08-28 | 1766896800 | mention | read |
| 117 | 2025-08-09 | 1765327200 | direct | unread |
| 118 | 2025-08-30 | 1767069600 | group | unread |
| 119 | 2025-08-31 | 1767156000 | direct | read |
| 120 | 2025-08-31 | 1767159600 | mention | unread |
+--------+------------+------------+-------------------+--------+
Columns:
- userid INT: account identifier.
- ds DATE: event date (UTC).
- event_time BIGINT: Unix epoch seconds for the event.
- notification_type STRING: e.g., 'direct', 'group', 'mention'.
- status STRING: 'unread' or 'read'.
Table: dim_circle_people (daily snapshot; one row per (person_id, userid, ds))
+-----------+--------+------------------+---------------+------------+
| person_id | userid | last_action_date | creation_date | ds |
+-----------+--------+------------------+---------------+------------+
| 1 | 111 | 2025-08-20 | 2024-01-01 | 2025-09-01 |
| 1 | 112 | 2025-08-15 | 2024-03-01 | 2025-09-01 |
| 2 | 113 | 2025-07-30 | 2022-10-10 | 2025-09-01 |
| 3 | 114 | 2025-08-05 | 2023-04-05 | 2025-09-01 |
| 3 | 115 | 2025-08-28 | 2023-05-10 | 2025-09-01 |
| 3 | 116 | 2025-06-10 | 2023-06-12 | 2025-09-01 |
| 4 | 117 | 2025-08-09 | 2025-08-01 | 2025-09-01 |
| 4 | 118 | 2025-08-30 | 2025-08-15 | 2025-09-01 |
| 4 | 119 | 2025-08-31 | 2025-08-20 | 2025-09-01 |
| 4 | 120 | 2025-08-31 | 2025-08-25 | 2025-09-01 |
+-----------+--------+------------------+---------------+------------+
Columns:
- person_id INT: real person cluster id (one person may own multiple accounts/userids).
- userid INT: account id belonging to that person.
- last_action_date DATE: date of the account's most recent action.
- creation_date DATE: account creation date.
- ds DATE: snapshot date; use the latest snapshot.
Definitions for this problem:
- Treat 'last month' as 2025-08-01 to 2025-08-31 inclusive.
- Consider an account 'active during last month' if last_action_date is between 2025-08-01 and 2025-08-31 (inclusive) in the dim_circle_people snapshot where ds = '2025-09-01'.
- A person is 'multi-account active' if they have >=2 accounts that meet the above activity rule.
- An 'unread notification' is any notifications row with status = 'unread' and ds between 2025-08-01 and 2025-08-31 (inclusive).
Tasks:
(a) Write a single SQL query that returns exactly three numbers in one row: the count of people with exactly two active accounts, exactly three active accounts, and more than three active accounts, all based on the above definitions. Name the output columns two_accts, three_accts, gt_three_accts.
(b) Among people who are 'multi-account active' (>=2 active accounts by the above rule), write a SQL query that returns one row with a single column percent_unread: the percentage of those people who have at least one unread notification on any of their accounts during 2025-08-01..2025-08-31. Compute as 100.0 * numerator / denominator and round to two decimals. Ensure each person is counted at most once in the numerator even if they have multiple unread notifications across accounts.
Edge cases to handle explicitly: do not count accounts with last_action_date outside August 2025; only use ds='2025-09-01' rows from dim_circle_people; ignore notifications outside August 2025; avoid double-counting per person; return 0.00 if the denominator is zero.
Quick Answer: This question evaluates SQL-based data manipulation and analytics skills, including joining daily snapshots to event tables, applying time-window filters, mapping multiple accounts to a person, and computing aggregate counts and unread percentages.