This question evaluates cohort analysis, time-based aggregation, event-time filtering, and retention metric computation skills tied to joins and timestamp handling.
You are given two tables:
users
user_id
BIGINT PRIMARY KEY
signup_ts
TIMESTAMP
user_events
user_id
BIGINT
event_ts
TIMESTAMP
event_name
VARCHAR
user_events.user_id references users.user_id. Assume all timestamps are stored in UTC.
Define a user's cohort month as DATE_TRUNC('month', signup_ts). A user is considered retained in month n if they generate at least one event in the calendar month that is n months after their cohort month. Ignore any events that occur before the user's signup_ts.
Write SQL to compute monthly cohort-based retention. Return one row per cohort_month and month_number with the following columns:
cohort_month
month_number
cohort_size
: number of distinct users who signed up in that cohort month
retained_users
: number of distinct users from that cohort with at least one event in that month number
retention_rate
:
retained_users / cohort_size
Your result should support month 0, 1, 2, ... retention analysis.