How would you evaluate a free-trial A/B test?
Company: OpenAI
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: medium
Interview Round: Technical Screen
You run an online marketing experiment to evaluate whether offering **a free 1‑month trial** increases growth.
## Experiment context
- Eligible visitors are randomly assigned at first exposure to one of two variants:
- **Control**: no free-trial offer
- **Treatment**: shown a free 1‑month trial offer
- The business cares about:
1) **Signup rate** (did the user start a trial?)
2) **Retention** (did the user come back after signing up?)
- Concern: Treatment could increase signups but attract lower-intent users, potentially hurting downstream retention and/or revenue.
## Your tasks
1) **Define metrics precisely**
- Propose a primary metric and key secondary metrics.
- Include at least one **guardrail** (e.g., revenue/cost/abuse).
- Give concrete definitions for “signup rate” and “retention” (e.g., D7/D30), including the denominator.
2) **Choose the analysis approach**
- Specify the analysis population(s): **ITT vs per-protocol**, and how you would handle users who never saw the offer after assignment.
- Explain how you would estimate the treatment effect for:
- a binary conversion metric (signup)
- a retention metric that is only defined for users who signed up (post-treatment selection)
3) **Identify common pitfalls / logic errors in an experiment analysis codebase**
Without writing code, list the most likely bugs or setup problems you would look for when reviewing Python analysis code for this experiment (e.g., bad time windows, wrong joins, leakage, incorrect denominators, repeated-measures issues, post-treatment filtering, peeking).
4) **Make a business recommendation**
- Describe how you would decide whether to ship the free-trial offer, iterate, or stop.
- Discuss what additional analyses you would do to connect the experiment to business value (e.g., LTV, payback period, heterogeneity, novelty effects).
Assume standard frequentist inference unless you justify alternatives.
Quick Answer: Evaluates skills in A/B test design and causal inference within the Analytics & Experimentation domain, emphasizing precise metric definition (including guardrails), choice of analysis population (ITT vs per‑protocol), handling of post‑treatment selection, and interpretation of binary conversion and retention metrics; the abstraction level is technical and oriented toward intermediate-to-senior data scientists who must combine statistical rigor with product impact thinking. Commonly asked because it tests the ability to balance acquisition versus downstream retention and revenue, to identify typical analysis pitfalls (time-window errors, incorrect joins/denominators, leakage, peeking, and selection bias), and to translate experiment results into a business recommendation linked to value metrics like LTV and payback period.