!image A/B testing is one of the most important responsibilities for Data Scientists working on product, growth, or marketplace teams. Interviewers look for candidates who can articulate not only the statistical components of an experiment, but also the product reasoning, bias mitigation, operational challenges, and decision-making framework. This guide provides a highly structured, interview-ready framework that senior DS candidates use to answer any A/B test question—from ranking changes to pricing to onboarding flows. --- 1. Define the Goal: What Problem Is the Feature Solving? Before diving into metrics and statistics, clearly explain the underlying motivation. This demonstrates product sense and aligned thinking with business objectives. Good goal statements explain: * The user problem * Why it matters * The expected behavioral change * How this supports company objectives Examples Search relevance improvement: Goal: Help users find relevant results faster, improving engagement and long-term retention. Checkout redesign: Goal: Reduce friction at checkout to improve conversion without increasing error rate or latency. New onboarding tutorial: Goal: Reduce confusion for first-time users and increase Day-1 activation. A crisp goal sets the stage for everything that follows. --- 2. Define Success Metrics, Input Metrics, and Guardrails A strong experiment design is built on a clear measurement framework. 2.1 Success Metrics Primary metrics that directly reflect whether the goal is achieved. Examples: * Conversion rate * Search result click-through rate * Watch time per active user * Onboarding completion rate Explain why each metric indicates success. 2.2 Input / Diagnostic Metrics Help interpret why the primary metric moved. Examples: * Queries per user * Add-to-cart rate before conversion * Time spent on each onboarding step * Bounce rate on redesigned pages Input metrics help you debug ambiguous outcomes. 2.3 Guardrail Metrics Ensure no critical system or experience is harmed. Common guardrails: * Latency * Crash rate / error rate * Revenue per user * Supply-side metrics (for marketplaces) * Content diversity * Abuse or report rate Mentioning guardrails shows mature product thinking and real-world experience. --- 3. Experiment Design, Power, Dilution, and Exposure Points This section demonstrates statistical rigor and real experimentation experience. 3.1 Exposure Point: What It Is and Why It Matters Exposure point refers to the precise moment when a user first experiences the treatment. Examples: * First time a user performs a search (for search ranking experiments) * First page load during a session (for UI layout changes) * First checkout attempt (for pricing changes) Why Exposure Point Matters If the randomization unit is “user” but only some users ever reach the exposure point, then: * Many users in treatment never see the feature * Their outcomes are identical to control * The treatment effect is diluted * Power decreases * Required sample size increases * Test duration becomes longer Example of Dilution Imagine only 30% of users actually visit the search page. Even if your feature improves search CTR by 10% among exposed users, the total effect looks like: Overall lift ≈ 0.3 × 10% = 3% Your experiment must detect 3%, not 10%, drastically increasing required sample size. This is why clearly defining exposure points is essential for estimating power and test duration. --- 3.2 Sample Size and Power Calculation Explain that you calculate sample size using: * Minimum Detectable Effect (MDE) * Standard deviation * Significance level (alpha) * Power (1 – beta) Then: Test duration = required_sample_size × 2 / daily_traffic --- 3.3 How to Reduce Test Duration and Increase Power Interviewers love when candidates proactively mention ways to speed up experiments. Here are the most important strategies: 1. Avoid Dilution Trigger assignment only at exposure point*. * Randomize only users who actually experience the feature. * Filter out users who never hit exposure. This alone often cuts test duration by 30–60%. 2. Apply CUPED to Reduce Variance CUPED leverages pre-experiment metrics to reduce noise. Examples: * Pre-period engagement * Past purchase behavior * Historical search activity Variance reduction often yields: * 20–50% reduction in required sample size * Much shorter experiments This is a sign of high-level experimentation expertise. 3. Sequential Testing Allows stopping early when results are conclusive while controlling Type I error. Common techniques: * Group sequential tests * Alpha spending * Bayesian sequential testing Sequential testing is especially useful when traffic is limited. 4. Increase MDE (Detect a Larger Effect) If the business only cares about big wins, raise the MDE. Higher MDE → lower required sample size → shorter test. 5. Use a Higher Significance Level (Higher Alpha) Relaxing alpha from 0.05 to 0.1 reduces sample size. Mention that this should be done consciously based on: * Risk