Predicting 60-Day Adoption of Subscription by Non-Subscription Merchants
Context
You need to predict which merchants who are not currently using the Subscription product will adopt it within the next 60 days. For the live run, only data available up to 2025-07-03 may be used to predict adoption by 2025-09-01.
Assume you have: transaction/event logs (charges, refunds, disputes, payouts), merchant metadata (signup date, vertical, country), and identifiers like customer_id and card_fingerprint. Assume an event that uniquely indicates Subscription adoption (e.g., first Subscription API event or first Subscription invoice) is available with a timestamp.
Task
Design a production-ready classification approach and answer concisely:
(a) Labeling: Precisely define positives/negatives and the observation and outcome windows; handle merchants already using Subscription and cold-start merchants.
(b) Leakage: List at least five concrete leakage risks specific to this data and how to prevent them via time-based feature windows and proper splits.
(c) Features: Propose 15–25 high-signal, computable features from transactions (recency/frequency/monetary, 28–35 day repeat patterns, customer concentration, card_fingerprint diversity, weekend share, chargeback/refund rates, growth rates) and from merchant metadata (age, vertical, geo).
(d) Modeling: Choose two models (e.g., regularized logistic vs. gradient-boosted trees); discuss class imbalance handling (weights vs. downsampling), calibration, and interpretability for a sales handoff.
(e) Evaluation: Specify time-based cross-validation, primary metrics (PR-AUC, precision@K, recall@K), and how you would select a threshold to deliver a list of 1,000 merchants with expected precision ≥ 0.60.
(f) Monitoring: Define post-deployment drift and performance checks (data drift on feature distributions, label drift, calibration drift) and how to retrain without contaminating future labels.