Design an A/B test for ML model launch

Q: Design an A/B test for ML model launch

This question evaluates a data scientist's competence in experimental design, statistical power analysis, sequential monitoring, covariate adjustment (e.g., CUPED), and practical operational concerns like guardrails, interference, and ramping for ML model launches.

Q: How do I approach Analytics & Experimentation interview questions?

Analytics & Experimentation questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master analytics & experimentation interviews.

Question

Feed Ranker A/B Test Design and Powering

You are replacing the current ranker with a new model in a feed. Baseline CTR is 2.0%. You expect a +5% relative lift on CTR and want 90% power at α = 0.05. Daily eligible traffic is 1,000,000 users; assignment is user-level 50/50 and stable over time.

Assume CTR is measured at the user level over the test window (Bernoulli per user: clicked ≥1 vs. not), users are independently and identically distributed within arms, and there is no interference (SUTVA) unless otherwise addressed.

A) Sample Size and Minimum Duration

Use a two-proportion power analysis for a two-sided test to detect an absolute lift of 0.1 percentage points (from 2.0% to 2.1%).
State and use your variance assumptions and show formulas and steps.
Adjust the resulting sample size for (i) 5% expected bot/invalid traffic and (ii) up to 1% sample ratio mismatch (SRM).
Compute the minimum test duration given daily traffic and 50/50 assignment.

B) Guardrails and Sequential Monitoring

Propose guardrail metrics (bounce rate, crashes, p95 latency, revenue per user) with decision thresholds and how you will test them (e.g., non-inferiority).
Describe a sequential monitoring plan using α-spending (e.g., Pocock or O’Brien–Fleming via Lan–DeMets) that allows early stopping without inflating Type I error.

C) Novelty, DOW Effects, and CUPED

Address novelty and day-of-week effects.
Propose CUPED (covariate adjustment) using pre-experiment user CTR: specify the covariate, the adjustment, and how you would validate variance reduction without bias.

D) Interference and Contamination

Mitigate interference: ensure user-level bucketing, prevent cross-arm content spillover, and plan a geo or holdout if network effects are suspected.

E) Post-Test Checks and Ramp

After the test, outline checks for achieved power, heterogeneity of treatment effects across cohorts, and how you’d decide to ramp to 100%.

Design an A/B test for ML model launch

Feed Ranker A/B Test Design and Powering

Solution

Comments (0)

Design an A/B test for ML model launch

Overview

Feed Ranker A/B Test Design and Powering

Solution

Comments (0)