Design a CVR model for RTB bidding
Company: Tradedesk
Role: Data Scientist
Category: Machine Learning
Difficulty: hard
Interview Round: Technical Screen
You are a data scientist at a Demand-Side Platform (DSP) participating in Real-Time Bidding (RTB). For each ad opportunity (impression), your system must quickly decide:
- whether to bid
- how much to bid
- which creative/ad to show
You are asked to design an ML approach to predict **conversion probability (CVR)** for a campaign (e.g., “Nike shoes”).
Data you may have (historical):
- Impressions table: `impression_id`, timestamp, user/device/context, publisher/app/site, geo, auction metadata, bid price, win/loss, etc.
- Clicks table: `impression_id`, click timestamp (optional, sparse)
- Conversions table: `impression_id` (or user-level attribution key), conversion timestamp/value (very sparse)
Tasks:
1) Clearly define the learning target: what does “predict conversion over impression” mean in this context? Choose an attribution window and label definition.
2) Propose a feature set that is realistic for RTB (available at bid time) and discuss leakage risks.
3) Choose a baseline model and a production candidate (e.g., Logistic Regression vs Gradient-Boosted Trees such as LightGBM). Explain tradeoffs.
4) Choose an appropriate loss function and justify it. Explain why you would use log loss / binary cross-entropy and why MSE or AUC are not appropriate as training losses.
5) Conversions are rare. Describe at least two ways to handle class imbalance (e.g., class weighting, negative downsampling) and how these choices affect probability calibration.
6) Define offline evaluation (metrics + validation scheme) for CVR prediction in a non-stationary ad-tech environment. Include why PR-AUC may be more informative than ROC-AUC.
Assume strict latency constraints at inference (tens of milliseconds) and that predicted probabilities will be used downstream for bidding decisions.
Quick Answer: This Machine Learning and ad-tech question evaluates a data scientist's ability to define and operationalize a conversion-rate (CVR) model for real-time bidding, testing competencies in label/attribution window definition, realistic bid-time feature engineering and leakage assessment, model selection and calibration under class imbalance, and consideration of inference latency. It is commonly asked because RTB requires accurate, well-calibrated, low-latency probability estimates that affect bidding outcomes, and it tests both conceptual understanding and practical application including loss-function rationale, rare-event evaluation choices (e.g., PR-AUC vs ROC-AUC), validation under non-stationarity, and production constraints.