##### Scenario An e-commerce firm plans to send personalized marketing emails to increase purchase conversions and wants to rigorously evaluate the impact. ##### Question Design an A/B test to measure whether personalized emails lift conversion rate. Specify the primary and guardrail metrics, statistical test, minimum detectable effect, required sample size and duration. After launching on the full user base, a new director reruns the test and sees only a 2 % lift versus the original 20 %. List possible causes and the analyses you would run to diagnose the discrepancy. ##### Hints Think through experiment design, power analysis, instrumentation issues, seasonality, user overlap, novelty effects and segmentation cuts.

Evaluates A/B test design and discrepancy diagnosis for personalized marketing emails. Strong answers define user-level randomization, metrics, power, and guardrails, then explain why a later rollout might show smaller lift through population, seasonality, implementation, instrumentation, or statistical issues.

How do I approach Analytics & Experimentation interview questions?

Analytics & Experimentation questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master analytics & experimentation interviews.

What difficulty level is this interview question?

This is a medium difficulty Analytics & Experimentation question, commonly asked during Onsite rounds at Coinbase.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Coinbase during technical interviews.

Diagnose Discrepancy in A/B Test Conversion Rate Results

An e-commerce company plans to send personalized marketing emails to increase purchase conversions. An initial experiment showed a large lift, but after broader rollout a later test found a much smaller lift.

Constraints & Assumptions

Focus on rigorous experiment design and post-hoc diagnosis, not on building the personalization model itself.
Assume user-level email eligibility, send, open, click, purchase, unsubscribe, and revenue data are available.
Conversion lift should be measured causally against a control group.
Consider both statistical explanations and real product or implementation explanations.

Clarifying Questions to Ask

What was the original population, and how did it differ from the later rollout population?
Were send time, subject line, cadence, offer, and creative held constant?
Was assignment persistent at the user level?
Was the 20% lift relative or absolute, and over what conversion window?

Part 1 - Design the A/B Test

Design an A/B test to measure whether personalized emails increase conversion rate.

What This Part Should Cover

Unit of randomization, exposure rules, control and treatment definitions, and eligibility criteria.
Primary metric such as purchase conversion, plus secondary and guardrail metrics like revenue, unsubscribes, spam complaints, margin, and long-term retention.
Statistical test, analysis window, variance reduction, minimum detectable effect, sample size, and expected duration.
Instrumentation checks, sample-ratio mismatch checks, and pre-registration of success criteria.

Part 2 - Diagnose the Lift Discrepancy

After full rollout, a new director reruns the test and observes only a 2% lift instead of the original 20%. List plausible causes and the analyses you would run.

What This Part Should Cover

Differences in population, seasonality, campaign creative, offer, email deliverability, model version, product changes, and competitor or market conditions.
Statistical issues such as underpowered tests, novelty effects, peeking, multiple testing, regression to the mean, or sample-ratio mismatch.
Implementation problems such as treatment contamination, incorrect logging, duplicate sends, personalization not actually applied, or inconsistent attribution windows.
Segment analysis and reanalysis using the original experiment definition where possible.

What a Strong Answer Covers

A strong answer designs a clean user-level experiment, quantifies power and launch criteria, and diagnoses the later discrepancy with specific checks rather than vague speculation.

Follow-up Questions

How would you design a long-term holdout after rollout?
What if open rate increases but purchase conversion does not?
How would you explain relative versus absolute lift to a non-technical stakeholder?