Plan and analyze a ranking A/B test

Q: Plan and analyze a ranking A/B test

This question evaluates experimental-design and causal-inference competencies for online A/B testing, covering metric definition, randomization strategy under cross-session carryover and interference, power and sample-size calculations, sequential monitoring, heterogeneity analysis, and safe rollout and ML retraining considerations.

Q: How do I approach Analytics & Experimentation interview questions?

Analytics & Experimentation questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master analytics & experimentation interviews.

Question

Experiment Design: New Search Ranking Feature

Context

You are designing, running, and analyzing an online controlled experiment to evaluate a new search ranking feature for a consumer app with logged-in users. The feature may exhibit cross-session carryover (users learn or form habits) and could create network/interference effects (e.g., popularity feedback loops, shared caches, or ranking signals that influence others).

Tasks

Unit of randomization
- Choose among user-level, session-level, or query-level randomization.
- Justify your choice given likely cross-session carryover and potential network/interference.
Metrics
- Define a primary success metric (e.g., query-level search success or downstream conversion within 24 hours), including precise measurement windows and inclusion criteria.
- Define guardrail metrics (e.g., latency, crash rate, ads revenue if applicable, bounce rate) and how they’ll be monitored.
Power and sample size
- Baseline click-through rate (CTR) is 10%; you seek a relative +2% uplift (to 10.2%). Use a two-sided α = 0.05 and power = 0.8.
- Show the standard two-proportion z-test sample size formula and compute the required per-variant sample size.
- Discuss how clustering (e.g., user-level correlation) or CUPED would change the requirement.
Execution plan
- Outline SRM checks; triggered vs. intent-to-treat analyses; bucketing consistency across services; novelty effects/burn-in; and sequential monitoring without inflating Type I error.
Heterogeneity
- Pre-register segments (e.g., head vs. tail queries, country, device) and describe how you would test for treatment-by-segment interaction while controlling false discovery.
Interference and long-term effects
- If ranking changes affect supply/demand dynamics or popularity feedback, propose cluster-randomization or switchback testing and how to interpret results.
Rollout
- Define stop/go criteria and a safe ramp plan.
- Explain how to update ML training data post-experiment to avoid entangling model training with experimental exposure.

Plan and analyze a ranking A/B test

Experiment Design: New Search Ranking Feature

Context

Tasks

Solution

Comments (0)

Plan and analyze a ranking A/B test

Overview

Experiment Design: New Search Ranking Feature

Context

Tasks

Solution

Comments (0)