Design an A/B test for search ranking
Company: Google
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: easy
Interview Round: HR Screen
## Scenario
You work on a search product and have built a new search ranking/retrieval algorithm (Variant B). The current algorithm is Variant A. You need to design an online experiment to decide whether to launch B.
## Task
Design an A/B test plan that covers:
1. **Goal & hypotheses**
- What is the primary product goal (e.g., improved relevance, engagement, or long-term retention)?
- State clear hypotheses (e.g., B improves relevance without harming latency).
2. **Experiment design**
- Choose the **experimental unit** (user, device, session, query) and justify it.
- Randomization approach (simple vs. stratified), and key stratification variables (e.g., locale, platform, query category).
- Handling **interference/contamination** (e.g., cross-device users, cached results, shared accounts).
- Duration and ramp plan (e.g., 1% → 10% → 50%), plus stopping rules.
3. **Metrics**
- Propose a **primary metric** (one) and justify it.
- Propose **diagnostic metrics** to understand *why* results change.
- Propose **guardrail metrics** to prevent regressions.
Consider tradeoffs such as:
- Short-term engagement vs. long-term user value
- Relevance improvements vs. **latency / cost**
- Click metrics vs. **good clicks** (dwell time, reformulation)
4. **Power / sample size**
- What inputs do you need to compute sample size (baseline rate, variance, MDE, alpha, power)?
- How would you handle multiple comparisons if testing many metrics or segments?
5. **Analysis plan**
- How will you compute treatment effects (difference in means/proportions; user-level aggregation)?
- How will you check for **sample ratio mismatch (SRM)** and data quality issues?
- What key segments would you examine (new vs. returning, head vs. tail queries), and how do you avoid p-hacking?
6. **Risks & pitfalls**
- How do you address novelty effects, learning-to-rank feedback loops, or delayed outcomes?
- What would make you decide to *not* trust the experiment result?
## Output
Provide a structured experiment proposal (bulleted plan) including the final metric set and launch decision criteria.
Quick Answer: This question evaluates a data scientist's competency in online experimentation, causal inference, product analytics, and operational metrics engineering, including A/B test design, metric selection, power/sample-size reasoning, interference mitigation, and analysis planning.