Design a CVR model for RTB bidding

Q: Design a CVR model for RTB bidding

This Machine Learning and ad-tech question evaluates a data scientist's ability to define and operationalize a conversion-rate (CVR) model for real-time bidding, testing competencies in label/attribution window definition, realistic bid-time feature engineering and leakage assessment, model selection and calibration under class imbalance, and consideration of inference latency. It is commonly asked because RTB requires accurate, well-calibrated, low-latency probability estimates that affect bidding outcomes, and it tests both conceptual understanding and practical application including loss-function rationale, rare-event evaluation choices (e.g., PR-AUC vs ROC-AUC), validation under non-stationarity, and production constraints.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Q: What difficulty level is this interview question?

This is a hard difficulty Machine Learning question, commonly asked during Technical Screen rounds at Tradedesk.

Q: What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Tradedesk during technical interviews.

Question

You are a data scientist at a Demand-Side Platform (DSP) participating in Real-Time Bidding (RTB). For each ad opportunity (impression), your system must quickly decide:

whether to bid
how much to bid
which creative/ad to show

You are asked to design an ML approach to predict conversion probability (CVR) for a campaign (e.g., “Nike shoes”).

Data you may have (historical):

Impressions table: impression_id , timestamp, user/device/context, publisher/app/site, geo, auction metadata, bid price, win/loss, etc.
Clicks table: impression_id , click timestamp (optional, sparse)
Conversions table: impression_id (or user-level attribution key), conversion timestamp/value (very sparse)

Tasks:

Clearly define the learning target: what does “predict conversion over impression” mean in this context? Choose an attribution window and label definition.
Propose a feature set that is realistic for RTB (available at bid time) and discuss leakage risks.
Choose a baseline model and a production candidate (e.g., Logistic Regression vs Gradient-Boosted Trees such as LightGBM). Explain tradeoffs.
Choose an appropriate loss function and justify it. Explain why you would use log loss / binary cross-entropy and why MSE or AUC are not appropriate as training losses.
Conversions are rare. Describe at least two ways to handle class imbalance (e.g., class weighting, negative downsampling) and how these choices affect probability calibration.
Define offline evaluation (metrics + validation scheme) for CVR prediction in a non-stationary ad-tech environment. Include why PR-AUC may be more informative than ROC-AUC.

Assume strict latency constraints at inference (tens of milliseconds) and that predicted probabilities will be used downstream for bidding decisions.

Design a CVR model for RTB bidding

Quick Overview

Solution

Comments (0)