Produce dating profile funnel report by cohort
Company: Meta
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Onsite
You work on a dating app. Produce a daily profile-funnel report for 2025-08-25 through 2025-09-01 inclusive, with one row per day, gender, and age_bucket (18–24, 25–34, 35–44, 45+). For each group, compute: profile_creation_cnt, photo_present_rate, avg_completion_rate (filled_fields/total_fields), profile_viewers_cnt (distinct viewers), likes_sent_cnt, match_rate (matches/views). Also output a separate list of user_ids with profiles missing a photo as of 2025-09-01. Assume ages are computed from birthdate at report date. Return two result sets (or two CTE outputs).
Schemas:
- users(user_id INT, gender TEXT, birthdate DATE, country_code CHAR(2), created_at TIMESTAMP)
- profiles(user_id INT, has_photo BOOLEAN, filled_fields INT, total_fields INT, updated_at TIMESTAMP)
- profile_events(event_id INT, event_date DATE, actor_user_id INT, target_user_id INT, event_type TEXT CHECK (event_type IN ('view','like','match')))
Sample rows (subsets):
users
+---------+--------+------------+--------------+
| user_id | gender | birthdate | country_code |
+---------+--------+------------+--------------+
| 10 | F | 1998-05-01 | FR |
| 11 | M | 1990-02-10 | US |
| 12 | F | 1985-11-20 | US |
+---------+--------+------------+--------------+
profiles
+---------+-----------+---------------+-------------+---------------------+
| user_id | has_photo | filled_fields | total_fields| updated_at |
+---------+-----------+---------------+-------------+---------------------+
| 10 | true | 8 | 10 | 2025-08-31 10:00:00 |
| 11 | false | 5 | 10 | 2025-08-31 12:00:00 |
| 12 | true | 10 | 10 | 2025-09-01 09:00:00 |
+---------+-----------+---------------+-------------+---------------------+
profile_events
+----------+------------+---------------+---------------+------------+
| event_id | event_date | actor_user_id | target_user_id| event_type |
+----------+------------+---------------+---------------+------------+
| 9001 | 2025-08-31 | 11 | 10 | view |
| 9002 | 2025-08-31 | 11 | 10 | like |
| 9003 | 2025-08-31 | 10 | 11 | match |
+----------+------------+---------------+---------------+------------+
Clearly state any assumptions (e.g., how to bucket ages, whether match requires prior like).
Quick Answer: This question evaluates a data scientist's competency in cohort-based funnel analysis, metric computation (counts and rates), temporal cohorting, deduplication, and aggregation of relational event and profile data.