PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Machine Learning/Capital One

How would you design delay and watchlist models?

Last updated: Mar 29, 2026

Quick Overview

This question evaluates competencies in end-to-end machine learning system design, covering time-series regression and label-leakage concerns, feature engineering, handling skewed targets and rare costly events, imbalanced and open-set face-recognition classification, evaluation and calibration, thresholding and decision systems, deployment and monitoring, and ethical/privacy trade-offs. It is commonly asked to assess the ability to balance statistical modeling with operational, business, and legal constraints; the domain is Machine Learning for a Data Scientist role and the required level spans both conceptual understanding and practical application.

  • medium
  • Capital One
  • Machine Learning
  • Data Scientist

How would you design delay and watchlist models?

Company: Capital One

Role: Data Scientist

Category: Machine Learning

Difficulty: medium

Interview Round: Technical Screen

You may be asked one or both of the following machine-learning case questions: 1. Flight-delay prediction case An airline wants a model that predicts departure delay in minutes for each flight 2 hours before scheduled departure. You have historical flight operations data, airport congestion, aircraft and route information, weather forecasts, and crew or maintenance signals. Propose a regression-based approach and explain: - how you define the target and avoid label leakage; - which features you would engineer; - how you would split training and validation data over time; - which evaluation metrics you would use, such as MAE, RMSE, or quantile loss, and why; - how you would handle missing data, outliers, and highly correlated variables; - whether multicollinearity is harmful for prediction, interpretability, or both; - what threshold would make you call a correlation high, and why; - alternatives to dropping correlated features, such as regularization, feature clustering, PCA, or tree-based models; - if you remove a feature, how you would estimate that feature's business impact; - how you would turn model outputs into concrete operational recommendations for the airline. Assume delays are right-skewed, severe delays are rare but costly, and airport-specific operational policies differ across hubs. 2. Watchlist face-recognition case A bank wants to use branch camera feeds to flag whether an entering customer matches a watchlist of known robbers. Describe how you would design the model and decision system. Address: - closed-set versus open-set recognition; - data collection and labeling; - low base rates and class imbalance; - false-positive versus false-negative costs; - threshold selection, calibration, and human review; - fairness, privacy, consent, and legal risk; - latency and on-device versus server inference; - monitoring for drift, spoofing, and adversarial attacks. For both cases, explain not only the modeling approach but also the business and ethical tradeoffs.

Quick Answer: This question evaluates competencies in end-to-end machine learning system design, covering time-series regression and label-leakage concerns, feature engineering, handling skewed targets and rare costly events, imbalanced and open-set face-recognition classification, evaluation and calibration, thresholding and decision systems, deployment and monitoring, and ethical/privacy trade-offs. It is commonly asked to assess the ability to balance statistical modeling with operational, business, and legal constraints; the domain is Machine Learning for a Data Scientist role and the required level spans both conceptual understanding and practical application.

Related Interview Questions

  • Deep-dive XGBoost handling and overfitting - Capital One (medium)
  • Build House Price Model Responsibly - Capital One (easy)
  • Design robber detection from surveillance video - Capital One (easy)
  • Explain core ML concepts and lifecycle - Capital One (medium)
  • Build and evaluate donation propensity model - Capital One (Medium)
Capital One logo
Capital One
Jan 30, 2026, 12:00 AM
Data Scientist
Technical Screen
Machine Learning
4
0

You may be asked one or both of the following machine-learning case questions:

  1. Flight-delay prediction case An airline wants a model that predicts departure delay in minutes for each flight 2 hours before scheduled departure. You have historical flight operations data, airport congestion, aircraft and route information, weather forecasts, and crew or maintenance signals. Propose a regression-based approach and explain:
  • how you define the target and avoid label leakage;
  • which features you would engineer;
  • how you would split training and validation data over time;
  • which evaluation metrics you would use, such as MAE, RMSE, or quantile loss, and why;
  • how you would handle missing data, outliers, and highly correlated variables;
  • whether multicollinearity is harmful for prediction, interpretability, or both;
  • what threshold would make you call a correlation high, and why;
  • alternatives to dropping correlated features, such as regularization, feature clustering, PCA, or tree-based models;
  • if you remove a feature, how you would estimate that feature's business impact;
  • how you would turn model outputs into concrete operational recommendations for the airline.

Assume delays are right-skewed, severe delays are rare but costly, and airport-specific operational policies differ across hubs.

  1. Watchlist face-recognition case A bank wants to use branch camera feeds to flag whether an entering customer matches a watchlist of known robbers. Describe how you would design the model and decision system. Address:
  • closed-set versus open-set recognition;
  • data collection and labeling;
  • low base rates and class imbalance;
  • false-positive versus false-negative costs;
  • threshold selection, calibration, and human review;
  • fairness, privacy, consent, and legal risk;
  • latency and on-device versus server inference;
  • monitoring for drift, spoofing, and adversarial attacks.

For both cases, explain not only the modeling approach but also the business and ethical tradeoffs.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Capital One•More Data Scientist•Capital One Data Scientist•Capital One Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.