Diagnose DAU drop with SQL by country
Company: Yahoo
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
Write a single SQL query to diagnose a country-level DAU drop for an email product. Assume today is 2025-09-01. DAU is the count of distinct user_id with at least one of event_type IN ('login','open_mail','send_mail','push_click') on a date. Exclude is_bot = TRUE events. Also exclude users who are in experiments where exp_name = 'mail_redesign' AND exp_group = 'treatment' if the experiment is active on the date (start_date <= date <= end_date). Compare 2025-09-01 to the average DAU over the prior 7 days (2025-08-25..2025-08-31). Output the top 3 countries by absolute DAU drop with columns: country, dau_prev7d_avg, dau_today, drop_abs, drop_pct. Additionally, for each of those countries, also output dau_today_returning and dau_today_new, where returning users have created_at < '2025-08-25' and new users have created_at BETWEEN '2025-08-25' AND '2025-09-01'. Break ties in drop_abs by larger drop_pct. Use only standard SQL constructs (CTEs, window functions allowed). Schema and tiny sample data:
users
| user_id | country | created_at |
|--------:|---------|-------------|
| 1 | US | 2025-08-10 |
| 2 | US | 2025-08-26 |
| 3 | IN | 2025-08-01 |
| 4 | IN | 2025-08-30 |
| 5 | US | 2025-07-15 |
| 6 | IN | 2025-08-27 |
events
| event_date | user_id | event_type | device | app_version | is_bot |
|------------|---------|-------------|---------|-------------|--------|
| 2025-08-25 | 1 | login | ios | 10.2 | 0 |
| 2025-08-31 | 1 | open_mail | ios | 10.2 | 0 |
| 2025-09-01 | 1 | open_mail | ios | 10.2 | 0 |
| 2025-08-28 | 2 | login | android | 10.3 | 0 |
| 2025-09-01 | 2 | login | android | 10.3 | 0 |
| 2025-08-27 | 3 | login | web | - | 0 |
| 2025-08-30 | 3 | open_mail | web | - | 0 |
| 2025-09-01 | 3 | open_mail | web | - | 0 |
| 2025-08-31 | 4 | login | android | 10.1 | 0 |
| 2025-09-01 | 4 | login | android | 10.1 | 0 |
| 2025-08-25 | 5 | login | ios | 10.0 | 0 |
| 2025-08-31 | 5 | login | ios | 10.0 | 0 |
| 2025-09-01 | 5 | login | ios | 10.1 | 0 |
| 2025-09-01 | 6 | login | web | - | 1 |
experiments
| user_id | exp_name | exp_group | start_date | end_date |
|---------|-----------------|-----------|------------|------------|
| 2 | mail_redesign | treatment | 2025-08-29 | 2025-09-10 |
| 3 | mail_redesign | control | 2025-08-29 | 2025-09-10 |
Be careful to: (a) de-duplicate users per day across multiple events/devices, (b) filter experiment treatment users only for dates when the experiment is active, (c) compute the 7-day average correctly even if a country has zero DAU on some baseline days.
Quick Answer: This question evaluates SQL data-manipulation and analytical skills, including DAU aggregation with per-day deduplication, date-range baseline averaging, experiment-treatment exclusion, country-level cohorting, and new vs returning user segmentation.