
Session‑level recommendations have stateful effects and feedback loops affecting long‑term retention. a) Formulate the problem as an MDP (state, action, reward, horizon) and contrast with contextual bandits. b) Outline offline policy evaluation using doubly‑robust inverse propensity scoring and describe diagnostics for support violations. c) Propose safe exploration under business constraints (e.g., conservative policy improvement). d) Address network effects and interference during evaluation and rollout.