You are analyzing an A/B test for a marketing campaign offering a free 1-month trial.
You are given raw “upstream” tables that resemble production event logs rather than pre-aggregated metrics.
Assume UTC timestamps and that each user has at most one assignment.
experiment_assignmentuser_id
(STRING, PK)
experiment_id
(STRING)
variant
(STRING; 'control' or 'treatment')
assigned_at
(TIMESTAMP)
offer_exposure_eventsuser_id
(STRING)
exposed_at
(TIMESTAMP)
campaign_id
(STRING)
subscription_eventsuser_id
(STRING)
event_time
(TIMESTAMP)
event_type
(STRING; one of 'trial_start', 'paid_start', 'cancel')
app_sessionsuser_id
(STRING)
session_start
(TIMESTAMP)
trial_start
event where
event_time
is in
[assigned_at, assigned_at + 7 days)
.
session_start
in
[assigned_at + 30 days, assigned_at + 37 days)
.
Write SQL to output one row per variant with:
variant
assigned_users
signup_users_7d
signup_rate_7d
retained_users_d30
retention_rate_d30
Also briefly describe what an “upstream” dataset/logging pipeline must contain to compute these metrics reliably (e.g., assignment logs, exposure logs, subscription lifecycle events, identity consistency).