Compute signup rate and retention from raw logs

Q: Compute signup rate and retention from raw logs

This is a Data Manipulation (SQL/Python) interview question from OpenAI for Data Scientist roles. View the full question and solution on PracHub.

Q: How do I approach Data Manipulation (SQL/Python) interview questions?

Data Manipulation (SQL/Python) questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master data manipulation (sql/python) interviews.

Question

Scenario

You are analyzing an A/B test for a marketing campaign offering a free 1-month trial.

You are given raw “upstream” tables that resemble production event logs rather than pre-aggregated metrics.

Assume UTC timestamps and that each user has at most one assignment.

Tables (upstream)

`experiment_assignment`

user_id (STRING, PK)
experiment_id (STRING)
variant (STRING; 'control' or 'treatment')
assigned_at (TIMESTAMP)

`offer_exposure_events`

user_id (STRING)
exposed_at (TIMESTAMP)
campaign_id (STRING)

`subscription_events`

user_id (STRING)
event_time (TIMESTAMP)
event_type (STRING; one of 'trial_start', 'paid_start', 'cancel')

`app_sessions`

user_id (STRING)
session_start (TIMESTAMP)

Metric definitions

Signup rate (trial start within 7 days)

Denominator: users assigned to the experiment.
Numerator: users with a trial_start event where event_time is in [assigned_at, assigned_at + 7 days) .

D30 activity retention Among users who started a trial within 7 days, the user is retained if they have ≥ 1 app session with session_start in [assigned_at + 30 days, assigned_at + 37 days) .

Task

Write SQL to output one row per variant with:

variant
assigned_users
signup_users_7d
signup_rate_7d
retained_users_d30
retention_rate_d30

Also briefly describe what an “upstream” dataset/logging pipeline must contain to compute these metrics reliably (e.g., assignment logs, exposure logs, subscription lifecycle events, identity consistency).

Compute signup rate and retention from raw logs

Scenario

Tables (upstream)

experiment_assignment

offer_exposure_events

subscription_events

app_sessions