You are given a single table, company, with columns:
Compute monthly cohorted retention based on signup_date. Define cohort_month = DATE_TRUNC('month', signup_date).
Assume data_cutoff_date is a parameter representing the latest reliable observation date. If not provided, default it to the maximum observed date across signup_date, termination_date, subscription_date:
Right-censoring rule: include only cohorts for which the retention window is fully observable at the cohort level. Specifically, exclude any cohort where cohort_month + INTERVAL '30 days' > data_cutoff_date.
A company is considered retained at day D if:
Edge cases to handle:
Return for the 30-day retention:
Then generalize to a monthly retention curve by adding 60 and 90 days using a generate_series of thresholds (30, 60, 90) and pivoting/widening the results into:
Table (sample):
company +------------+-------------+-------------------+------------------+ | company_id | signup_date | subscription_date | termination_date | +------------+-------------+-------------------+------------------+ | 1 | 2019-05-20 | 2019-06-02 | 2020-01-10 | | 2 | 2019-06-15 | 2019-06-20 | NULL | | 3 | 2019-07-01 | 2019-07-05 | 2019-12-31 | | 4 | 2020-06-10 | 2020-06-11 | NULL | | 5 | 2020-06-30 | 2020-07-02 | NULL | | 6 | 2020-07-15 | NULL | NULL | +------------+-------------+-------------------+------------------+
Login required