You are launching a new personalized ranking on the product listing page. Define: (a) the primary success metric and its exact formula (include numerator/denominator and data filters); (b) at least three guardrail metrics with thresholds (e.g., latency p95, refund rate, bounce rate) and why each protects against specific failure modes. Then design the experiment: unit of randomization (and why), exposure rules, pre-exposure filtering, blocking/stratification, and handling repeat visitors across devices. Compute the required sample size to detect a 1.5% relative lift in the primary metric with 90% power and two-sided alpha=0.05, given a baseline mean of 3.2 and SD of 2.1 per user-day; show the formula you use. Specify your stopping rule (fixed horizon vs alpha-spending), how you'll apply CUPED or re-randomization to reduce variance, and the exact SRM test you’ll run each day. Describe how you will visualize results and diagnose issues (e.g., quantile treatment effects, funnel breakouts, time-since-exposure plots), and how you will interpret heterogeneous effects without p-hacking.