How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

What difficulty level is this interview question?

This is a Medium difficulty Machine Learning question, commonly asked during Onsite rounds at Capital One.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Capital One during technical interviews.

Choose and justify ML algorithms for tabular prediction

Q: Choose and justify ML algorithms for tabular prediction

This question evaluates a candidate's competency in model selection for tabular regression—weighing trade-offs among linear/regularized regression, decision trees, Random Forest, and XGBoost—along with skills in preprocessing, categorical handling, missingness, latency and memory constraints, instance-level explainability, fairness checks, calibration, and experiment design. It belongs to the Machine Learning domain and is commonly asked to assess both conceptual understanding of model behavior and bias–variance trade-offs and practical application skills in validation strategy, performance/latency measurement, and production-ready explainability and robustness.

Capital One

Oct 13, 2025, 9:49 PM

Data Scientist

Onsite

Machine Learning

1

0

You must choose an algorithm for tabular prediction of arrival delay under these constraints: 500k rows, 120 features (mixed numeric/categorical with missingness), non‑linear interactions, strict latency <100 ms per prediction, need instance‑level explanations for operations. Make and defend a choice among linear/regularized regression, a single decision tree, Random Forest, and XGBoost: 1) Lay out a comparison plan (feature preprocessing, categorical handling, leakage guards, time‑aware CV) and selection metrics (RMSE, calibration, latency, memory). 2) Argue when linear regression beats trees (bias/variance, extrapolation, monotonic constraints) and when trees/ensembles dominate (non‑linearities, interactions). 3) Compare Random Forest vs XGBoost in depth: training/inference cost, sensitivity to noisy features, overfitting risk, class/label imbalance handling for regression, robustness to missing values, hyperparameters that most affect bias/variance, and when RF may outperform XGBoost in practice. 4) Describe how you’d produce stable, fast explanations (e.g., TreeSHAP vs permutation), ensure fairness checks, and calibrate predictions. 5) Specify an experiment design to confidently pick a winner (stratified, time‑split CV, paired tests on fold errors, and holdout confirmation).

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Capital One•More Data Scientist•Capital One Data Scientist•Capital One Machine Learning•Data Scientist Machine Learning

Choose and justify ML algorithms for tabular prediction

Quick Overview

Comments (0)

Choose and justify ML algorithms for tabular prediction

Quick Overview

Comments (0)