Using PostgreSQL, compute monthly signup cohort D30 retention and 30-day ARPU. Cohort month = date_trunc('month', signup_date). D30 retention = users with any event on the calendar day signup_date + 30 days (ignore events before signup_date as data errors). 30D ARPU = sum of payments with pay_ts < signup_date + interval '30 days' divided by cohort size. Deduplicate exact duplicate events by (user_id, event_ts, event_name). Schema:
users(user_id INT PRIMARY KEY, signup_date DATE, country TEXT)
events(user_id INT, event_ts DATE, event_name TEXT)
payments(user_id INT, pay_ts DATE, amount NUMERIC(10,2))
Sample data:
users
+---------+-------------+---------+
| user_id | signup_date | country |
+---------+-------------+---------+
| 1 | 2025-06-01 | US |
| 2 | 2025-06-15 | US |
| 3 | 2025-07-03 | CA |
| 4 | 2025-07-20 | US |
| 5 | 2025-07-31 | GB |
+---------+-------------+---------+
events
+---------+------------+------------+
| user_id | event_ts | event_name |
+---------+------------+------------+
| 1 | 2025-07-01 | login |
| 1 | 2025-07-01 | login | (duplicate)
| 2 | 2025-07-15 | purchase |
| 3 | 2025-08-02 | login |
| 4 | 2025-08-19 | browse |
| 5 | 2025-08-30 | login |
+---------+------------+------------+
payments
+---------+------------+--------+
| user_id | pay_ts | amount |
+---------+------------+--------+
| 1 | 2025-06-20 | 10.00 |
| 1 | 2025-07-10 | 20.00 |
| 2 | 2025-06-16 | 15.00 |
| 3 | 2025-07-20 | 5.00 |
| 5 | 2025-08-15 | 12.00 |
+---------+------------+--------+
Write a single SQL query returning: cohort_month, d30_retention_rate, arpu_30d. Explain how your query avoids duplicates and late events.