Compute User Retention and Analyze Event Data
Company: Snapchat
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Onsite
user_events
+---------+---------------------+------------+-------+
| user_id | event_time | event_type | page |
+---------+---------------------+------------+-------+
| 101 | 2023-04-01 10:00:00 | click | home |
| 101 | 2023-04-02 11:00:00 | purchase | cart |
| 102 | 2023-04-01 12:30:00 | click | home |
| 103 | 2023-04-03 09:15:00 | impression | deal |
| 102 | 2023-04-04 14:20:00 | purchase | cart |
+---------+---------------------+------------+-------+
##### Scenario
Data team maintains a user_events fact table tracking every page view, click and purchase. Leadership wants ad-hoc SQL insights.
##### Question
Write a query to compute 7-day user retention (inclusive) for users who first visited on '2023-04-01'. Return the top 3 pages by distinct purchasing users last month. Find the median number of daily events per active user in April 2023. In Python, given a Pandas DataFrame identical to user_events, produce an hourly time-series of clicks per page for the past 24 hours.
##### Hints
Window functions, CTEs, DATE_DIFF and groupby should all be considered.
Quick Answer: This question evaluates a candidate's ability to perform SQL and Python data manipulation for event analytics, including cohort retention, aggregation of distinct purchasing users, median-of-daily-events calculations, and hourly time-series grouping.