Calculate Cohort Retention
Company: Intuit
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: medium
Interview Round: Technical Screen
You are given two tables:
`users`
- `user_id` BIGINT PRIMARY KEY
- `signup_ts` TIMESTAMP
`user_events`
- `user_id` BIGINT
- `event_ts` TIMESTAMP
- `event_name` VARCHAR
`user_events.user_id` references `users.user_id`. Assume all timestamps are stored in UTC.
Define a user's **cohort month** as `DATE_TRUNC('month', signup_ts)`. A user is considered **retained** in month `n` if they generate at least one event in the calendar month that is `n` months after their cohort month. Ignore any events that occur before the user's `signup_ts`.
Write SQL to compute **monthly cohort-based retention**. Return one row per `cohort_month` and `month_number` with the following columns:
- `cohort_month`
- `month_number`
- `cohort_size`: number of distinct users who signed up in that cohort month
- `retained_users`: number of distinct users from that cohort with at least one event in that month number
- `retention_rate`: `retained_users / cohort_size`
Your result should support month 0, 1, 2, ... retention analysis.
Quick Answer: This question evaluates cohort analysis, time-based aggregation, event-time filtering, and retention metric computation skills tied to joins and timestamp handling.