Evaluation Plan for a New Recommendation Module in a Commerce App
Background
You are asked to evaluate a new recommendation module for a commerce app. The module may exhibit cross-user interference (users influence each other via shared popularity signals, inventory pressure, or model feedback loops) and outcomes can be affected by traffic seasonality and non-stationarity.
Tasks
-
Define an Overall Evaluation Criterion (OEC) and three guardrail metrics with precise formulas, units, and measurement windows. Example guardrails include churn, latency p95, and complaint rate.
-
Choose one test design (user-level RCT, geo-clustered RCT, or time-based switchback). Justify your choice with respect to interference, non-stationarity/seasonality, and operational constraints. State any design-specific controls you will use (e.g., model isolation, warmups).
-
Describe the ramp strategy and pre-registration plan: stopping rules, power target and MDE, variance reduction (e.g., CUPED/covariate adjustment), and small-area risk controls.
-
If randomization is infeasible, propose a quasi-experimental fallback (synthetic control or difference-in-differences). List the necessary assumptions and the falsification/placebo tests you will run.
-
Mid-test, suppose the OEC flatlines while add-to-cart rises and conversion falls. Provide a metric-debugging checklist and the exact diagnostic cuts you will request (e.g., by device, geography, new vs. returning, latency buckets). Include relevant equations to localize the issue.