This question evaluates proficiency in policy evaluation, off-policy estimation, probability calibration, leakage detection, handling delayed labels, and production monitoring for budget-constrained coupon targeting, and it belongs to the Machine Learning and Data Science domain.
Context
Tasks (a) Define a success metric aligned to profit and explain why AUC/accuracy can be misleading for targeting under a budget.
(b) Using the randomized dataset, derive off-policy estimators to compare the two deterministic policies induced by M0 and M1 (each implements a daily top-K rule under budget B): inverse propensity scoring (IPS), self-normalized IPS (SNIPS), and doubly robust (DR). Write formulas, state assumptions for unbiasedness, and discuss variance trade-offs and cross-fitting.
(c) Describe how to calibrate probabilities (e.g., isotonic/Platt), set a daily threshold to respect budget B under drift, and directly optimize expected profit subject to guardrails (e.g., opt-out rate, complaint rate).
(d) List three concrete leakage risks (e.g., features reflecting prior coupon exposure, post-treatment variables, future-engagement proxies) and how to detect/prevent them.
(e) Explain handling delayed redemption labels and per-user redemption caps in both training and evaluation to avoid bias.
(f) Outline a monitoring plan for non-stationarity and cold-start users, including shadow deployment and canarying.
Login required