Design a CVR model for RTB bidding

Q: Design a CVR model for RTB bidding

This question evaluates competency in real-time bidding (RTB) systems and conversion-rate (CVR) modeling, covering product-level system understanding, feature engineering, model choice, loss and calibration, class-imbalance handling, evaluation metrics, latency and scalability, and production monitoring in an ad-tech context.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

You are interviewing for a DSP (e.g., The Trade Desk). Answer the following end-to-end product + ML case about real-time bidding (RTB).

Part A — RTB system understanding

Explain what RTB is and the roles of:
- Advertiser
- Ad Exchange
- DSP
When an ad opportunity (impression) arrives, walk through what happens in milliseconds.
How does the DSP decide:
- whether to bid?
- how much to bid?
- which ad/creative to show?

Part B — Build a conversion-rate model (CVR)

You need a model to predict the probability of conversion for “Nike shoes” given an impression.

What training data would you use? (e.g., impressions, clicks, conversions). Define:
- What is a “conversion” event?
- What is the prediction target and time window (e.g., conversion within 7 days of impression)?
What features would you engineer from historical ad data? Include examples across:
- user/context, publisher/placement, device/geo/time, ad/creative, advertiser/campaign, frequency/recency, historical aggregates
What model would you choose and why?
- Compare logistic regression vs tree-based models (e.g., LightGBM) in this setting.
Loss function & optimization:
- What loss would you train on and why (e.g., log loss / binary cross-entropy)?
- Why not MSE?
- Why isn’t AUC typically used as a training loss?
- What does “predicting conversion over impression” mean for supervision and labeling?
- How do loss functions relate to bidding decisions?

Part C — Practical ML concerns

Class imbalance : Conversions are rare.
- When would you use class weighting vs negative downsampling?
- What preprocessing should you avoid?
- How can imbalance handling affect probability calibration ?
Evaluation :
- Offline: choose metrics (PR-AUC, ROC-AUC, log loss) and justify.
- Explain why PR-AUC can be more informative than ROC-AUC.
- Why does calibration matter?
- Online: how would you evaluate the model in production? What business metrics matter (e.g., CPA, ROAS, spend efficiency)?
Precision/recall tradeoff in RTB :
- How do false positives vs false negatives differ in cost?
- What is F1 score, and why might it be a poor objective for ad-tech bidding?
- How would you use a PR curve to select an operating point?
Scalability & production :
- Discuss training vs inference scalability for LightGBM.
- RTB latency constraints: what parts of the feature/model pipeline are bottlenecks?
- How would you deploy safely (shadow mode, ramp-up, rollback)?
Overfitting & robustness :
- Why is overfitting common in CVR prediction?
- How do you prevent it (regularization, early stopping, time-based validation, feature aggregation)?
- What monitoring and guardrails would you add for a bidding system?

Provide a structured, end-to-end answer with assumptions and tradeoffs.

Design a CVR model for RTB bidding

Quick Overview

Part A — RTB system understanding

Part B — Build a conversion-rate model (CVR)

Part C — Practical ML concerns

Solution

Comments (0)