Compute churn with re-subscriptions
Company: OpenAI
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: hard
Interview Round: Technical Screen
You are analyzing a free-month subscription experiment. Assume experiment eligibility has already been computed; every user in the experiment table is already qualified.
Use the following tables:
**Table: `experiment_users`**
- `user_id` STRING — unique user identifier
- `assignment_ts` TIMESTAMP — time the user was assigned to treatment or control
- `treatment_flag` STRING — either `treatment` or `control`
**Table: `subscription_events`**
- `user_id` STRING — user identifier
- `event_ts` TIMESTAMP — event timestamp in UTC
- `event_type` STRING — one of `start_free_month`, `first_paid_month`, `renewal`, `cancel`, `reactivate`
- `monthly_price` NUMERIC — monthly subscription price recorded on paid events
Assumptions:
- Each user appears once in `experiment_users`.
- `event_ts` is unique within each user.
- Do **not** re-implement qualification logic; assume `experiment_users` already contains only qualified users.
- A user is **paid_converted_30d** if they have a `first_paid_month` event within 30 days after `assignment_ts`.
- A user is **retained_paid_60d** if, at exactly 60 days after `assignment_ts`, their latest subscription state at or before that timestamp is a paid state. Treat `first_paid_month`, `renewal`, and `reactivate` as paid states; treat `cancel` as not active; `start_free_month` alone does not count as paid.
- Users may cancel and reactivate multiple times. The latest state on or before day 60 determines whether they are retained.
- All timestamps should be interpreted in UTC.
Write a SQL query that returns the following output columns for each `treatment_flag`:
- `treatment_flag`
- `assigned_users`
- `paid_converted_users_30d`
- `retained_paid_users_60d`
Quick Answer: This question evaluates proficiency in event-time data manipulation, temporal aggregation, and subscription-state logic for computing conversion and retention metrics using SQL or Python; it falls under the Data Manipulation (SQL/Python) domain and targets practical application rather than pure conceptual theory.