This question evaluates a data scientist's competency in experimental design, causal inference, and statistical power analysis for A/B tests, covering randomization unit choice, contamination prevention, metric and guardrail selection, clustering adjustments, and bias-control techniques.

You must run an A/B test to evaluate the new hashtag recommender starting on 2025‑09‑01. 1) Define the randomization unit (user/session/impression) and justify it given potential interference (users can see and follow the same hashtag across sessions). 2) Specify treatment and control selection, rollout/ramp plan, and how you will prevent cross‑over and contamination (e.g., sticky bucketing, holdout of creators/hashtags if needed). 3) Define a primary metric (e.g., 24‑hour follow‑through rate per exposed user) and at least three guardrails (e.g., session length, violating‑hashtag follow rate, crash rate). 4) Powering: with a baseline follow rate of 4.0%, two‑sided alpha=0.05, power=0.80, and a minimum detectable effect of +5% relative, compute the required users per arm assuming user‑level randomization and intraclass correlation ρ=0.02 from repeated sessions. Show formulas and adjustments for clustering and expected sample‑ratio mismatch detection. 5) Outline bias controls: CUPED or pre‑period stratification, novelty/winner’s curse handling, sequential monitoring rules, minimum test duration, and diagnostics for interference or network effects.