Predicting 30‑Day Adoption of Product P for Budgeted Outreach
Context
You are tasked with building a model to prioritize user outreach for Product P. Use historical data to predict which users will adopt Product P in the next 30 days and optimize whom to contact under a daily outreach capacity.
-
Data sources:
-
user_profile: static attributes (e.g., geography, device, acquisition channel, tenure).
-
user_events: timestamped events (page_view, search, add_to_cart, purchase, unsubscribe, etc.).
-
marketing_contacts: timestamps and channel(s) of outreach (email, push, SMS, etc.).
-
product_catalog: product metadata (categories, price, margin, text).
-
Time windows:
-
Training window: 2025‑03‑01 to 2025‑06‑30.
-
Prediction window: 2025‑07‑01 to 2025‑07‑31.
Tasks
-
Precisely define the prediction target and labeling rule while preventing target leakage (including handling of contacts and post‑label features).
-
Propose features (behavioral recency/frequency, content affinity, embeddings) with an explicit time cutoff, and explain how you’d handle cold‑start users.
-
Choose a model (ranking vs. classification) and justify with pros/cons given class imbalance and outreach budget constraints.
-
Specify offline metrics (PR‑AUC, top‑k recall, calibration/Brier) and map them to online business outcomes.
-
With a daily outreach budget that allows contacting at most 50,000 users/day, formulate threshold selection to maximize expected incremental profit. Write the objective using p(adopt|contact), incremental lift, contact cost, and the capacity constraint. Explain how you’d estimate incremental lift from observational data.
-
Show a time‑series cross‑validation scheme that respects user and temporal leakage.
-
Detail calibration and post‑processing (e.g., isotonic, Platt), fairness constraints across markets, and drift detection/retraining triggers (e.g., PSI thresholds).
-
Outline ablation and slice‑robustness checks to include in the presentation to pre‑empt Q&A.