PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Machine Learning/Capital One

Choose and justify ML algorithms for tabular prediction

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's competency in model selection for tabular regression—weighing trade-offs among linear/regularized regression, decision trees, Random Forest, and XGBoost—along with skills in preprocessing, categorical handling, missingness, latency and memory constraints, instance-level explainability, fairness checks, calibration, and experiment design. It belongs to the Machine Learning domain and is commonly asked to assess both conceptual understanding of model behavior and bias–variance trade-offs and practical application skills in validation strategy, performance/latency measurement, and production-ready explainability and robustness.

  • Medium
  • Capital One
  • Machine Learning
  • Data Scientist

Choose and justify ML algorithms for tabular prediction

Company: Capital One

Role: Data Scientist

Category: Machine Learning

Difficulty: Medium

Interview Round: Onsite

You must choose an algorithm for tabular prediction of arrival delay under these constraints: 500k rows, 120 features (mixed numeric/categorical with missingness), non‑linear interactions, strict latency <100 ms per prediction, need instance‑level explanations for operations. Make and defend a choice among linear/regularized regression, a single decision tree, Random Forest, and XGBoost: 1) Lay out a comparison plan (feature preprocessing, categorical handling, leakage guards, time‑aware CV) and selection metrics (RMSE, calibration, latency, memory). 2) Argue when linear regression beats trees (bias/variance, extrapolation, monotonic constraints) and when trees/ensembles dominate (non‑linearities, interactions). 3) Compare Random Forest vs XGBoost in depth: training/inference cost, sensitivity to noisy features, overfitting risk, class/label imbalance handling for regression, robustness to missing values, hyperparameters that most affect bias/variance, and when RF may outperform XGBoost in practice. 4) Describe how you’d produce stable, fast explanations (e.g., TreeSHAP vs permutation), ensure fairness checks, and calibrate predictions. 5) Specify an experiment design to confidently pick a winner (stratified, time‑split CV, paired tests on fold errors, and holdout confirmation).

Quick Answer: This question evaluates a candidate's competency in model selection for tabular regression—weighing trade-offs among linear/regularized regression, decision trees, Random Forest, and XGBoost—along with skills in preprocessing, categorical handling, missingness, latency and memory constraints, instance-level explainability, fairness checks, calibration, and experiment design. It belongs to the Machine Learning domain and is commonly asked to assess both conceptual understanding of model behavior and bias–variance trade-offs and practical application skills in validation strategy, performance/latency measurement, and production-ready explainability and robustness.

Related Interview Questions

  • Deep-dive XGBoost handling and overfitting - Capital One (medium)
  • Build House Price Model Responsibly - Capital One (easy)
  • Design robber detection from surveillance video - Capital One (easy)
  • How would you design delay and watchlist models? - Capital One (medium)
  • Explain core ML concepts and lifecycle - Capital One (medium)
Capital One logo
Capital One
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Machine Learning
1
0

You must choose an algorithm for tabular prediction of arrival delay under these constraints: 500k rows, 120 features (mixed numeric/categorical with missingness), non‑linear interactions, strict latency <100 ms per prediction, need instance‑level explanations for operations. Make and defend a choice among linear/regularized regression, a single decision tree, Random Forest, and XGBoost: 1) Lay out a comparison plan (feature preprocessing, categorical handling, leakage guards, time‑aware CV) and selection metrics (RMSE, calibration, latency, memory). 2) Argue when linear regression beats trees (bias/variance, extrapolation, monotonic constraints) and when trees/ensembles dominate (non‑linearities, interactions). 3) Compare Random Forest vs XGBoost in depth: training/inference cost, sensitivity to noisy features, overfitting risk, class/label imbalance handling for regression, robustness to missing values, hyperparameters that most affect bias/variance, and when RF may outperform XGBoost in practice. 4) Describe how you’d produce stable, fast explanations (e.g., TreeSHAP vs permutation), ensure fairness checks, and calibrate predictions. 5) Specify an experiment design to confidently pick a winner (stratified, time‑split CV, paired tests on fold errors, and holdout confirmation).

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Capital One•More Data Scientist•Capital One Data Scientist•Capital One Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.