Click-Through Rate (CTR) Prediction: Build, Compare, and Justify Models
Context
You are given a tabular dataset for binary click prediction (click = 1, no click = 0). The goal is to produce well-calibrated click probabilities for ranking/decisioning. Assume features include user, content/ad, and context signals (e.g., user/device attributes, ad/category IDs, time features, historical interaction counts). The class distribution is roughly balanced (e.g., 40–60% positives).
Task
-
Establish a trivial baseline classifier.
-
Train and compare three models: logistic regression, random forest, and gradient-boosted trees.
-
Explain why you selected each algorithm and why you did not choose plausible alternatives (e.g., SVM, Naive Bayes, simple neural networks), discussing bias–variance, interpretability, and computational trade-offs.
-
Describe your cross-validation strategy and the key hyperparameters you would tune for each model.
-
Choose evaluation metrics appropriate for a roughly class-balanced dataset (e.g., ROC AUC, log loss, PR AUC, accuracy), justify them, and explain what would change if classes were imbalanced.
-
Outline additional improvements you would pursue with more time (features, modeling, calibration, serving/monitoring).