Write SQL/pandas for KPI anomaly
Company: Meta
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Onsite
Write SQL (and outline equivalent pandas) for a KPI anomaly investigation. Assume today = '2025-09-01'.
Schema:
Users(user_id INT, country TEXT, signup_date DATE)
Posts(post_id INT, user_id INT, post_type TEXT /* 'friend','page','event' */, created_at DATE)
Likes(like_id INT, user_id INT, post_id INT, created_at DATE)
FriendRequests(sender_id INT, receiver_id INT, created_at DATE, status TEXT /* 'sent','accepted','declined' */)
DailyActiveUsers(date DATE, country TEXT, dau INT)
Events(date DATE, outage_flag INT)
Sample rows:
Users
user_id | country | signup_date
1 | US | 2025-07-15
2 | US | 2025-08-20
3 | IN | 2025-08-01
4 | BR | 2025-07-01
5 | US | 2025-08-30
Posts
post_id | user_id | post_type | created_at
10 | 1 | friend | 2025-08-31
11 | 2 | page | 2025-08-31
12 | 3 | friend | 2025-09-01
13 | 4 | event | 2025-09-01
14 | 2 | friend | 2025-09-01
Likes
like_id | user_id | post_id | created_at
100 | 1 | 12 | 2025-09-01
101 | 2 | 10 | 2025-09-01
102 | 3 | 14 | 2025-09-01
103 | 4 | 10 | 2025-08-25
104 | 5 | 12 | 2025-09-01
FriendRequests
sender_id | receiver_id | created_at | status
1 | 2 | 2025-08-20 | accepted
2 | 3 | 2025-08-28 | sent
3 | 4 | 2025-08-29 | accepted
2 | 5 | 2025-08-31 | accepted
5 | 1 | 2025-09-01 | sent
DailyActiveUsers
date | country | dau
2025-08-25 | US | 3
2025-08-25 | IN | 1
2025-08-25 | BR | 1
2025-09-01 | US | 3
2025-09-01 | IN | 1
2025-09-01 | BR | 1
Events
date | outage_flag
2025-09-01 | 0
Tasks:
(a) For each country and post_type on 2025-09-01, compute Likes-per-DAU and its percent change versus the median of the same weekday over the previous 8 weeks, excluding dates where outage_flag=1. Use window functions (e.g., PERCENTILE_CONT) and ensure users are counted once per day for DAU via DailyActiveUsers.
(b) Flag the top 3 countries with the largest declines (≤ −10%) and, for each, attribute the decline between new users (signup_date ≥ '2025-08-02') and existing users. Return country, post_type, pct_change, share_of_decline_from_new_users.
(c) For those flagged countries, compute the Friend Request acceptance rate in the last 14 days (2025-08-19 to 2025-09-01) and compare with the prior 14 days. Output the absolute and relative change, using window functions rather than correlated subqueries.
(d) Provide a high-level pandas approach (groupby, merge, rolling/expanding, quantile) mirroring your SQL.
Edge cases to handle explicitly: users with multiple Likes on the same day, countries with sparse DAU, missing DAU rows, and time zones (assume UTC).
Quick Answer: This question evaluates proficiency with SQL window functions and equivalent pandas operations for KPI anomaly detection, covering baseline computation using same-weekday medians over eight weeks, per-country and per-post_type Likes-per-DAU metrics with deduplication of users for DAU, outage filtering, attribution of declines between new and existing users, and time-series comparisons like friend-request acceptance rates while explicitly handling duplicates, sparse or missing DAU rows, and timezone issues. It is commonly asked in Data Manipulation (SQL/Python) interviews to assess practical data engineering and analytical skills, primarily testing practical application with an underlying need for conceptual understanding of baselining, attribution, and data-quality edge cases.