Design and validate a cost-sensitive classifier

Q: Design and validate a cost-sensitive classifier

This question evaluates competence in cost-sensitive binary classification, handling delayed and imbalanced labels, calibration and decision-threshold selection, distributional-drift detection, monitoring, and low-latency deployment within the Machine Learning / Data Science domain.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Binary Purchase Prediction with Delayed Labels and Imbalanced Classes

Context

Goal: Ship a real-time binary classifier that predicts whether a user will purchase within the next 7 days.
Class imbalance: Positives ≈ 3%.
Label delay: Up to 10 days after the 7-day window (i.e., labels mature up to 17 days after the prediction time).
Features: Recent session statistics, counts, and recency by category.
Business values (per user):
- True Positive (TP): +$2 expected margin
- False Positive (FP): −$0.10 (annoyance/discount cost)
- False Negative (FN): −$0.50 (missed margin)
- True Negative (TN): $0

Tasks A) Propose an end-to-end training and evaluation design that avoids leakage under delayed labels. Specify an exact time-based cross-validation scheme (fold boundaries, feature and label windows) and explain why it’s unbiased.

B) Choose offline metrics and describe how to calibrate the model (e.g., Platt scaling or isotonic regression). Provide the formula for selecting the decision threshold that maximizes expected profit under the given costs, and explain how you would assess threshold stability across cohorts.

C) Handle distribution shift: outline drift detection on covariates and on calibration (e.g., PSI, ECE). Propose an online monitoring dashboard with guardrails.

D) Latency and interpretability: With a 50 ms p95 budget and 64 MB RAM per request, describe a deployable modeling choice and featurization plan (including any precomputed features) that meets constraints, plus a fallback rule when the model is unavailable.

E) Explain the model and threshold decisions to a non-technical stakeholder and reconcile if they insist on a different threshold. What evidence would you present to align on the target operating point?

Design and validate a cost-sensitive classifier

Binary Purchase Prediction with Delayed Labels and Imbalanced Classes

Solution

Comments (0)

Design and validate a cost-sensitive classifier

Overview

Binary Purchase Prediction with Delayed Labels and Imbalanced Classes

Solution

Comments (0)