Compute Cohort Retention Rate
Company: Intuit
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: medium
Interview Round: Technical Screen
You are given two tables:
- `users(user_id BIGINT, signup_ts TIMESTAMP)` — one row per user.
- `user_activity(user_id BIGINT, activity_ts TIMESTAMP, event_name VARCHAR)` — zero or more activity rows per user.
Assume all timestamps are in UTC. Define a user's cohort as the calendar month of `signup_ts`. A user is retained in month `k` if they have at least one row in `user_activity` during the calendar month that is `k` months after their cohort month. Month `0` is the signup month.
Write a SQL query to compute cohort-based monthly retention for `k = 0, 1, 2, 3, 4, 5, 6`.
Return these columns:
- `cohort_month` — first day of the signup month
- `months_since_signup` — integer from 0 to 6
- `cohort_size` — number of users in the cohort
- `retained_users` — number of distinct users active in that relative month
- `retention_rate` — `retained_users / cohort_size` as a decimal
Make sure each user is counted at most once per cohort-month and relative month.
Quick Answer: This question evaluates a data scientist's competency in cohort-based retention analysis, time-based data manipulation, and aggregation using Data Manipulation (SQL/Python) techniques.