How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

What difficulty level is this interview question?

This is a medium difficulty Machine Learning question, commonly asked during Technical Screen rounds at Capital One.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Capital One during technical interviews.

How would you design delay and watchlist models? | Capital One Interview Question

Q: How would you design delay and watchlist models?

This question evaluates competencies in end-to-end machine learning system design, covering time-series regression and label-leakage concerns, feature engineering, handling skewed targets and rare costly events, imbalanced and open-set face-recognition classification, evaluation and calibration, thresholding and decision systems, deployment and monitoring, and ethical/privacy trade-offs. It is commonly asked to assess the ability to balance statistical modeling with operational, business, and legal constraints; the domain is Machine Learning for a Data Scientist role and the required level spans both conceptual understanding and practical application.

You may be asked one or both of the following machine-learning case questions:

Flight-delay prediction case An airline wants a model that predicts departure delay in minutes for each flight 2 hours before scheduled departure. You have historical flight operations data, airport congestion, aircraft and route information, weather forecasts, and crew or maintenance signals. Propose a regression-based approach and explain:

how you define the target and avoid label leakage;
which features you would engineer;
how you would split training and validation data over time;
which evaluation metrics you would use, such as MAE, RMSE, or quantile loss, and why;
how you would handle missing data, outliers, and highly correlated variables;
whether multicollinearity is harmful for prediction, interpretability, or both;
what threshold would make you call a correlation high, and why;
alternatives to dropping correlated features, such as regularization, feature clustering, PCA, or tree-based models;
if you remove a feature, how you would estimate that feature's business impact;
how you would turn model outputs into concrete operational recommendations for the airline.

Assume delays are right-skewed, severe delays are rare but costly, and airport-specific operational policies differ across hubs.

Watchlist face-recognition case A bank wants to use branch camera feeds to flag whether an entering customer matches a watchlist of known robbers. Describe how you would design the model and decision system. Address:

closed-set versus open-set recognition;
data collection and labeling;
low base rates and class imbalance;
false-positive versus false-negative costs;
threshold selection, calibration, and human review;
fairness, privacy, consent, and legal risk;
latency and on-device versus server inference;
monitoring for drift, spoofing, and adversarial attacks.

For both cases, explain not only the modeling approach but also the business and ethical tradeoffs.

You may be asked one or both of the following machine-learning case questions:

Flight-delay prediction case An airline wants a model that predicts departure delay in minutes for each flight 2 hours before scheduled departure. You have historical flight operations data, airport congestion, aircraft and route information, weather forecasts, and crew or maintenance signals. Propose a regression-based approach and explain:

how you define the target and avoid label leakage;
which features you would engineer;
how you would split training and validation data over time;
which evaluation metrics you would use, such as MAE, RMSE, or quantile loss, and why;
how you would handle missing data, outliers, and highly correlated variables;
whether multicollinearity is harmful for prediction, interpretability, or both;
what threshold would make you call a correlation high, and why;
alternatives to dropping correlated features, such as regularization, feature clustering, PCA, or tree-based models;
if you remove a feature, how you would estimate that feature's business impact;
how you would turn model outputs into concrete operational recommendations for the airline.

Assume delays are right-skewed, severe delays are rare but costly, and airport-specific operational policies differ across hubs.

Watchlist face-recognition case A bank wants to use branch camera feeds to flag whether an entering customer matches a watchlist of known robbers. Describe how you would design the model and decision system. Address:

closed-set versus open-set recognition;
data collection and labeling;
low base rates and class imbalance;
false-positive versus false-negative costs;
threshold selection, calibration, and human review;
fairness, privacy, consent, and legal risk;
latency and on-device versus server inference;
monitoring for drift, spoofing, and adversarial attacks.

For both cases, explain not only the modeling approach but also the business and ethical tradeoffs.

How would you design delay and watchlist models?

Quick Overview

Solution

Comments (0)

How would you design delay and watchlist models?

Quick Overview

Solution

Comments (0)