Coupon Targeting Under a Daily Budget: Policy, OPE, Calibration, and Monitoring
Context
-
You have two user-scoring models for a $5 coupon: M0 (current) and M1 (new). Each outputs a score p_i that should be interpreted as P(redeem | send, user i).
-
You may send at most K promos per day and must ensure expected coupon spend ≤ B dollars/day. The coupon costs $5 only when redeemed.
-
Historical data comes from a randomized experiment (logging policy) with columns: {user_id, features X, assigned_treatment W ∈ {1=coupon,0=control}, outcome redeem Y ∈ {0,1}, gmv G, timestamp t}.
Tasks
(a) Define a success metric aligned to profit and explain why AUC/accuracy can be misleading for targeting under a budget.
(b) Using the randomized dataset, derive off-policy estimators to compare the two deterministic policies induced by M0 and M1 (each implements a daily top-K rule under budget B): inverse propensity scoring (IPS), self-normalized IPS (SNIPS), and doubly robust (DR). Write formulas, state assumptions for unbiasedness, and discuss variance trade-offs and cross-fitting.
(c) Describe how to calibrate probabilities (e.g., isotonic/Platt), set a daily threshold to respect budget B under drift, and directly optimize expected profit subject to guardrails (e.g., opt-out rate, complaint rate).
(d) List three concrete leakage risks (e.g., features reflecting prior coupon exposure, post-treatment variables, future-engagement proxies) and how to detect/prevent them.
(e) Explain handling delayed redemption labels and per-user redemption caps in both training and evaluation to avoid bias.
(f) Outline a monitoring plan for non-stationarity and cold-start users, including shadow deployment and canarying.