PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Capital One

Choose and justify ML algorithms for tabular prediction

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's competency in model selection for tabular regression—weighing trade-offs among linear/regularized regression, decision trees, Random Forest, and XGBoost—along with skills in preprocessing, categorical handling, missingness, latency and memory constraints, instance-level explainability, fairness checks, calibration, and experiment design. It belongs to the Machine Learning domain and is commonly asked to assess both conceptual understanding of model behavior and bias–variance trade-offs and practical application skills in validation strategy, performance/latency measurement, and production-ready explainability and robustness.

  • Medium
  • Capital One
  • Machine Learning
  • Data Scientist

Choose and justify ML algorithms for tabular prediction

Company: Capital One

Role: Data Scientist

Category: Machine Learning

Difficulty: Medium

Interview Round: Onsite

You must choose an algorithm for tabular prediction of arrival delay under these constraints: 500k rows, 120 features (mixed numeric/categorical with missingness), non‑linear interactions, strict latency <100 ms per prediction, need instance‑level explanations for operations. Make and defend a choice among linear/regularized regression, a single decision tree, Random Forest, and XGBoost: 1) Lay out a comparison plan (feature preprocessing, categorical handling, leakage guards, time‑aware CV) and selection metrics (RMSE, calibration, latency, memory). 2) Argue when linear regression beats trees (bias/variance, extrapolation, monotonic constraints) and when trees/ensembles dominate (non‑linearities, interactions). 3) Compare Random Forest vs XGBoost in depth: training/inference cost, sensitivity to noisy features, overfitting risk, class/label imbalance handling for regression, robustness to missing values, hyperparameters that most affect bias/variance, and when RF may outperform XGBoost in practice. 4) Describe how you’d produce stable, fast explanations (e.g., TreeSHAP vs permutation), ensure fairness checks, and calibrate predictions. 5) Specify an experiment design to confidently pick a winner (stratified, time‑split CV, paired tests on fold errors, and holdout confirmation).

Quick Answer: This question evaluates a candidate's competency in model selection for tabular regression—weighing trade-offs among linear/regularized regression, decision trees, Random Forest, and XGBoost—along with skills in preprocessing, categorical handling, missingness, latency and memory constraints, instance-level explainability, fairness checks, calibration, and experiment design. It belongs to the Machine Learning domain and is commonly asked to assess both conceptual understanding of model behavior and bias–variance trade-offs and practical application skills in validation strategy, performance/latency measurement, and production-ready explainability and robustness.

Related Interview Questions

  • Deep-dive XGBoost handling and overfitting - Capital One (medium)
  • Build House Price Model Responsibly - Capital One (easy)
  • Design robber detection from surveillance video - Capital One (easy)
  • How would you design delay and watchlist models? - Capital One (medium)
  • Explain core ML concepts and lifecycle - Capital One (medium)
|Home/Machine Learning/Capital One

Choose and justify ML algorithms for tabular prediction

Capital One logo
Capital One
Oct 13, 2025, 9:49 PM
MediumData ScientistOnsiteMachine Learning
1
0

You must choose an algorithm for tabular prediction of arrival delay under these constraints: 500k rows, 120 features (mixed numeric/categorical with missingness), non‑linear interactions, strict latency <100 ms per prediction, need instance‑level explanations for operations. Make and defend a choice among linear/regularized regression, a single decision tree, Random Forest, and XGBoost: 1) Lay out a comparison plan (feature preprocessing, categorical handling, leakage guards, time‑aware CV) and selection metrics (RMSE, calibration, latency, memory). 2) Argue when linear regression beats trees (bias/variance, extrapolation, monotonic constraints) and when trees/ensembles dominate (non‑linearities, interactions). 3) Compare Random Forest vs XGBoost in depth: training/inference cost, sensitivity to noisy features, overfitting risk, class/label imbalance handling for regression, robustness to missing values, hyperparameters that most affect bias/variance, and when RF may outperform XGBoost in practice. 4) Describe how you’d produce stable, fast explanations (e.g., TreeSHAP vs permutation), ensure fairness checks, and calibrate predictions. 5) Specify an experiment design to confidently pick a winner (stratified, time‑split CV, paired tests on fold errors, and holdout confirmation).

Loading comments...

Browse More Questions

More Machine Learning•More Capital One•More Data Scientist•Capital One Data Scientist•Capital One Machine Learning•Data Scientist Machine Learning

Write your answer

Your first approved answer each day earns 20 XP.

Sign in to write your answer.
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.