Compute multi-account actives and unread coverage
Company: Meta
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
You have two tables.
Table: notifications
+--------+------------+------------+-------------------+--------+
| userid | ds | time | notification_type | status |
+--------+------------+------------+-------------------+--------+
| 101 | 2025-08-12 | 1755283200 | direct | unread |
| 102 | 2025-08-25 | 1756416000 | circle | read |
| 201 | 2025-08-05 | 1754524800 | direct | unread |
| 202 | 2025-08-29 | 1756761600 | direct | unread |
| 302 | 2025-08-03 | 1754352000 | direct | read |
| 303 | 2025-08-18 | 1755542400 | direct | unread |
| 304 | 2025-08-22 | 1755888000 | direct | unread |
| 401 | 2025-08-04 | 1754438400 | direct | read |
| 402 | 2025-08-05 | 1754524800 | circle | read |
| 403 | 2025-08-06 | 1754611200 | direct | read |
| 404 | 2025-08-27 | 1756588800 | direct | unread |
+--------+------------+------------+-------------------+--------+
Table: dim_circle_people
+-----------+--------+------------------+---------------+------------+
| person_id | userid | last_action_date | creation_date | ds |
+-----------+--------+------------------+---------------+------------+
| 1 | 101 | 2025-08-12 | 2024-05-01 | 2025-08-12 |
| 1 | 102 | 2025-08-25 | 2025-01-10 | 2025-08-25 |
| 1 | 103 | 2025-06-20 | 2025-06-01 | 2025-08-25 |
| 2 | 201 | 2025-08-05 | 2025-07-15 | 2025-08-05 |
| 2 | 202 | 2025-08-29 | 2025-08-01 | 2025-08-29 |
| 3 | 301 | 2025-07-31 | 2025-07-01 | 2025-08-01 |
| 3 | 302 | 2025-08-02 | 2025-08-02 | 2025-08-02 |
| 3 | 303 | 2025-08-18 | 2025-08-10 | 2025-08-18 |
| 3 | 304 | 2025-08-21 | 2025-08-21 | 2025-08-21 |
| 4 | 401 | 2025-08-03 | 2025-07-20 | 2025-08-03 |
| 4 | 402 | 2025-08-10 | 2025-07-20 | 2025-08-10 |
| 4 | 403 | 2025-08-15 | 2025-07-20 | 2025-08-15 |
| 4 | 404 | 2025-08-28 | 2025-07-20 | 2025-08-28 |
+-----------+--------+------------------+---------------+------------+
Assumptions:
- "Last month" means 2025-08-01 through 2025-08-31 (inclusive). Treat all timestamps as UTC. If both ds and time exist, use ds to filter month; time is auxiliary.
- A "person" may own multiple accounts (distinct userids map to the same person_id). An account is "active during last month" if its last_action_date is between 2025-08-01 and 2025-08-31 inclusive.
- Unread notifications are rows in notifications with status = 'unread' and ds in 2025-08-01..2025-08-31.
Tasks (write ANSI SQL; CTEs allowed):
(a) Return a single row with three columns: two_accounts, three_accounts, gt_three_accounts — the counts of distinct person_id who had exactly 2, exactly 3, and more than 3 active accounts during 2025-08.
(b) Among people with >=2 active accounts in 2025-08, compute the percentage (0–100 with two decimals) who had at least one unread notification on any of their accounts in 2025-08. Return numerator, denominator, and pct. Ensure a person is counted once even if multiple accounts had unread. Handle persons with zero notifications correctly (they should be in the denominator but not the numerator).
Edge cases to handle explicitly in your SQL: duplicate rows per userid/day in notifications; persons whose some accounts were created during 2025-08; userids present in notifications but missing in dim_circle_people (exclude those from person-level stats).
Quick Answer: This question evaluates proficiency in SQL data manipulation and analytics, including per-person aggregation across multiple accounts, deduplication, date-based filtering, join logic, and cohort-level percentage calculations.