This Machine Learning and ad-tech question evaluates a data scientist's ability to define and operationalize a conversion-rate (CVR) model for real-time bidding, testing competencies in label/attribution window definition, realistic bid-time feature engineering and leakage assessment, model selection and calibration under class imbalance, and consideration of inference latency. It is commonly asked because RTB requires accurate, well-calibrated, low-latency probability estimates that affect bidding outcomes, and it tests both conceptual understanding and practical application including loss-function rationale, rare-event evaluation choices (e.g., PR-AUC vs ROC-AUC), validation under non-stationarity, and production constraints.
You are a data scientist at a Demand-Side Platform (DSP) participating in Real-Time Bidding (RTB). For each ad opportunity (impression), your system must quickly decide:
You are asked to design an ML approach to predict conversion probability (CVR) for a campaign (e.g., “Nike shoes”).
Data you may have (historical):
impression_id
, timestamp, user/device/context, publisher/app/site, geo, auction metadata, bid price, win/loss, etc.
impression_id
, click timestamp (optional, sparse)
impression_id
(or user-level attribution key), conversion timestamp/value (very sparse)
Tasks:
Assume strict latency constraints at inference (tens of milliseconds) and that predicted probabilities will be used downstream for bidding decisions.