You are given two tables:
-
users(user_id BIGINT, signup_ts TIMESTAMP)
— one row per user.
-
user_activity(user_id BIGINT, activity_ts TIMESTAMP, event_name VARCHAR)
— zero or more activity rows per user.
Assume all timestamps are in UTC. Define a user's cohort as the calendar month of signup_ts. A user is retained in month k if they have at least one row in user_activity during the calendar month that is k months after their cohort month. Month 0 is the signup month.
Write a SQL query to compute cohort-based monthly retention for k = 0, 1, 2, 3, 4, 5, 6.
Return these columns:
-
cohort_month
— first day of the signup month
-
months_since_signup
— integer from 0 to 6
-
cohort_size
— number of users in the cohort
-
retained_users
— number of distinct users active in that relative month
-
retention_rate
—
retained_users / cohort_size
as a decimal
Make sure each user is counted at most once per cohort-month and relative month.