This question evaluates skills in diagnosing production recommender systems, causal inference and experimentation, multi-objective ranking and safe exploration, and instrumentation within the machine learning domain of recommender systems and online experimentation.
Eats recommendations were changed to rank items primarily by distance to the user; after launch, add-to-cart rate rose but revenue per session fell. Diagnose and fix: define online and offline evaluation metrics and design both an A/B test and an offline counterfactual evaluation to separate causal from compositional effects; hypothesize mechanisms (e.g., cheaper nearby items cannibalize high-AOV items, position bias, distance–price/fee correlation, capacity throttling, promise-time effects, acceptance-rate shifts) and specify the checks you would run; propose a new ranking objective as a multi-objective optimization (expected revenue, ETA reliability, acceptance probability, fairness) with constraints and guardrails, and describe how you would add safe exploration (e.g., Thompson sampling or epsilon-greedy with caps); detail diagnostic slicing by zone/time/cohort, selection-bias controls, instrumentation to disentangle delivery-time and cancellation effects, and rollback criteria.