A/B Test: Clean, Analyze, Visualize, and Interpret Raw Log-Level Data
Scenario
You receive raw, log-level event data for an A/B test on a consumer booking funnel. Your goal is to clean the data, compute the primary experiment metrics, visualize the outcome, and provide a clear business interpretation.
Assumed Data Context
-
Each row is an event generated by a user during the experiment window.
-
Columns (typical):
-
user_id: unique user identifier
-
variant: 'control' or 'treatment' (also called 'test')
-
event: event name, e.g., 'exposure', 'view', 'click', 'convert'
-
ts: event timestamp
-
revenue: purchase revenue if a conversion occurs (0/NaN otherwise)
-
bot: boolean or 0/1 flag for suspected bot traffic
-
Optional covariates: device, country, etc.
-
If there is no explicit 'exposure' event, assume the user's first event marks exposure.
Task
-
Clean and transform the raw logs into a user-level analysis table.
-
Compute primary metrics:
-
Conversion rate (CR) per variant
-
Absolute difference and relative lift
-
Statistical significance (two-proportion z-test) and 95% confidence interval
-
(Optional) Revenue per user and Welch’s t-test
-
Produce at least one visualization that supports the conclusion (e.g., CR bar chart with 95% CI or cumulative CR over time).
-
Clearly interpret the result for a business audience.
Hints
-
Use pandas for ETL, seaborn/matplotlib for plots.
-
Use a two-sample proportion z-test for CR and Welch’s t-test for revenue.
-
Include a sanity check for sample ratio mismatch (SRM).