Build and assess CTR prediction
Company: Uber
Role: Data Scientist
Category: Machine Learning
Difficulty: hard
Interview Round: Technical Screen
You are asked to predict the probability that an ad impression leads to a click within 24 hours. The positive rate is ~0.7%. Features include user_age, device_type, locale, time_of_day, ad_id (high-cardinality), campaign_id, past_7d_impressions, past_7d_clicks, and referrer. Labels arrive with delay (some clicks arrive up to 24h later).
1) Modeling: Propose two model families suitable for extreme class imbalance and sparse/high-cardinality features. How will you encode ad_id/campaign_id without leakage? Describe your time-based CV scheme to respect label delay.
2) Imbalance: Compare class weighting, focal loss, undersampling, and calibrated thresholding. When would you avoid synthetic oversampling? Justify with expected effects on ranking vs calibration.
3) Evaluation: Your Model A has ROC-AUC=0.91 and PR-AUC=0.14; Model B has ROC-AUC=0.88 and PR-AUC=0.22. Explain why these can disagree at 0.7% prevalence, which you trust for email/ad CTR, and how you would choose operating thresholds for business objectives using a cost matrix (missed-click vs wasted impression).
4) Calibration and thresholds: Describe how you would assess and improve calibration (e.g., isotonic vs Platt) and select thresholds for (a) maximizing F1, and (b) maximizing expected profit. How would you compute precision@top1% and compare models on that metric?
5) Online validation: Outline a bucket test to validate lift using the model’s scores (e.g., top-k targeting). What logs do you need to detect covariate drift and label delay in production, and how do you guard against feedback loops?
Quick Answer: This question evaluates predictive modeling and applied data science skills for CTR prediction, covering handling extreme class imbalance, delayed feedback, sparse/high‑cardinality feature encoding, time‑aware validation, evaluation and calibration of probabilistic scores, and online A/B validation; it falls squarely in the Machine Learning domain and tests both conceptual understanding and practical application. It is commonly asked because it probes reasoning about real‑world production challenges—metric selection (ROC vs PR), thresholding under business costs, calibration methods, drift detection and avoiding feedback loops—without requiring specific implementation details.