PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Machine Learning/Uber

Build and assess CTR prediction

Last updated: Mar 29, 2026

Quick Overview

This question evaluates predictive modeling and applied data science skills for CTR prediction, covering handling extreme class imbalance, delayed feedback, sparse/high‑cardinality feature encoding, time‑aware validation, evaluation and calibration of probabilistic scores, and online A/B validation; it falls squarely in the Machine Learning domain and tests both conceptual understanding and practical application. It is commonly asked because it probes reasoning about real‑world production challenges—metric selection (ROC vs PR), thresholding under business costs, calibration methods, drift detection and avoiding feedback loops—without requiring specific implementation details.

  • hard
  • Uber
  • Machine Learning
  • Data Scientist

Build and assess CTR prediction

Company: Uber

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Technical Screen

You are asked to predict the probability that an ad impression leads to a click within 24 hours. The positive rate is ~0.7%. Features include user_age, device_type, locale, time_of_day, ad_id (high-cardinality), campaign_id, past_7d_impressions, past_7d_clicks, and referrer. Labels arrive with delay (some clicks arrive up to 24h later). 1) Modeling: Propose two model families suitable for extreme class imbalance and sparse/high-cardinality features. How will you encode ad_id/campaign_id without leakage? Describe your time-based CV scheme to respect label delay. 2) Imbalance: Compare class weighting, focal loss, undersampling, and calibrated thresholding. When would you avoid synthetic oversampling? Justify with expected effects on ranking vs calibration. 3) Evaluation: Your Model A has ROC-AUC=0.91 and PR-AUC=0.14; Model B has ROC-AUC=0.88 and PR-AUC=0.22. Explain why these can disagree at 0.7% prevalence, which you trust for email/ad CTR, and how you would choose operating thresholds for business objectives using a cost matrix (missed-click vs wasted impression). 4) Calibration and thresholds: Describe how you would assess and improve calibration (e.g., isotonic vs Platt) and select thresholds for (a) maximizing F1, and (b) maximizing expected profit. How would you compute precision@top1% and compare models on that metric? 5) Online validation: Outline a bucket test to validate lift using the model’s scores (e.g., top-k targeting). What logs do you need to detect covariate drift and label delay in production, and how do you guard against feedback loops?

Quick Answer: This question evaluates predictive modeling and applied data science skills for CTR prediction, covering handling extreme class imbalance, delayed feedback, sparse/high‑cardinality feature encoding, time‑aware validation, evaluation and calibration of probabilistic scores, and online A/B validation; it falls squarely in the Machine Learning domain and tests both conceptual understanding and practical application. It is commonly asked because it probes reasoning about real‑world production challenges—metric selection (ROC vs PR), thresholding under business costs, calibration methods, drift detection and avoiding feedback loops—without requiring specific implementation details.

Related Interview Questions

  • Evaluate Promotions for Uber Eats Users - Uber (medium)
  • Implement Streaming Clustering for Numbers - Uber
  • Build cold-start restaurant ratings - Uber (medium)
  • Implement CLIP Contrastive Loss - Uber (medium)
  • Predict driver acceptance - Uber (medium)
Uber logo
Uber
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Machine Learning
6
0

CTR Prediction with Delayed Feedback and Extreme Class Imbalance

You are building a model to predict the probability that an ad impression results in a click within 24 hours. The base positive rate is approximately 0.7%.

Available features:

  • user_age, device_type, locale, time_of_day
  • ad_id (high-cardinality), campaign_id (high-cardinality)
  • past_7d_impressions, past_7d_clicks
  • referrer

Labels are delayed: some clicks arrive up to 24 hours after the impression.

Tasks

  1. Modeling
    • Propose two model families suitable for extreme class imbalance and sparse/high-cardinality features.
    • Explain how you will encode ad_id/campaign_id without leakage.
    • Describe a time-based cross-validation scheme that respects the 24-hour label delay.
  2. Imbalance Handling
    • Compare class weighting, focal loss, undersampling, and calibrated thresholding.
    • When would you avoid synthetic oversampling? Justify based on expected effects on ranking vs calibration.
  3. Evaluation
    • Model A: ROC-AUC = 0.91, PR-AUC = 0.14. Model B: ROC-AUC = 0.88, PR-AUC = 0.22.
    • Explain why these can disagree at 0.7% prevalence, which metric you trust for email/ad CTR, and how to choose operating thresholds using a cost matrix (missed-click vs wasted impression).
  4. Calibration and Thresholds
    • Describe how to assess and improve calibration (e.g., isotonic vs Platt) and select thresholds for: a) maximizing F1, and b) maximizing expected profit.
    • How would you compute precision@top1% and compare models on that metric?
  5. Online Validation
    • Outline a bucket test (A/B) to validate lift using the model’s scores (e.g., top-k targeting).
    • What logs do you need to detect covariate drift and label delay in production, and how do you guard against feedback loops?

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Uber•More Data Scientist•Uber Data Scientist•Uber Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.