Build and assess CTR prediction

Q: Build and assess CTR prediction

This question evaluates predictive modeling and applied data science skills for CTR prediction, covering handling extreme class imbalance, delayed feedback, sparse/high‑cardinality feature encoding, time‑aware validation, evaluation and calibration of probabilistic scores, and online A/B validation; it falls squarely in the Machine Learning domain and tests both conceptual understanding and practical application. It is commonly asked because it probes reasoning about real‑world production challenges—metric selection (ROC vs PR), thresholding under business costs, calibration methods, drift detection and avoiding feedback loops—without requiring specific implementation details.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

CTR Prediction with Delayed Feedback and Extreme Class Imbalance

You are building a model to predict the probability that an ad impression results in a click within 24 hours. The base positive rate is approximately 0.7%.

Available features:

user_age, device_type, locale, time_of_day
ad_id (high-cardinality), campaign_id (high-cardinality)
past_7d_impressions, past_7d_clicks
referrer

Labels are delayed: some clicks arrive up to 24 hours after the impression.

Tasks

Modeling
- Propose two model families suitable for extreme class imbalance and sparse/high-cardinality features.
- Explain how you will encode ad_id/campaign_id without leakage.
- Describe a time-based cross-validation scheme that respects the 24-hour label delay.
Imbalance Handling
- Compare class weighting, focal loss, undersampling, and calibrated thresholding.
- When would you avoid synthetic oversampling? Justify based on expected effects on ranking vs calibration.
Evaluation
- Model A: ROC-AUC = 0.91, PR-AUC = 0.14. Model B: ROC-AUC = 0.88, PR-AUC = 0.22.
- Explain why these can disagree at 0.7% prevalence, which metric you trust for email/ad CTR, and how to choose operating thresholds using a cost matrix (missed-click vs wasted impression).
Calibration and Thresholds
- Describe how to assess and improve calibration (e.g., isotonic vs Platt) and select thresholds for: a) maximizing F1, and b) maximizing expected profit.
- How would you compute precision@top1% and compare models on that metric?
Online Validation
- Outline a bucket test (A/B) to validate lift using the model’s scores (e.g., top-k targeting).
- What logs do you need to detect covariate drift and label delay in production, and how do you guard against feedback loops?

Build and assess CTR prediction

CTR Prediction with Delayed Feedback and Extreme Class Imbalance

Tasks

Solution

Comments (0)

Build and assess CTR prediction

Overview

CTR Prediction with Delayed Feedback and Extreme Class Imbalance

Tasks

Solution

Comments (0)