Decide standardization, sparse numerics, correlated features

Q: Decide standardization, sparse numerics, correlated features

This question evaluates a candidate's competency in feature engineering and preprocessing for mixed-type tabular data, including handling sparse counts and heavy-tailed monetary features, missingness and zero-inflation, correlated continuous measurements, model-specific scaling needs, and the design of leak-safe pipelines and validation strategies.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

You are given a tabular dataset for supervised learning with features: F1 (counts, mostly small integers with many zeros), F2 (monetary amounts in dollars, heavy-tailed), F3 (binary flag), F4 and F5 (highly correlated continuous measurements), and target y. Tasks: 1) Decide exactly which features need standardization or normalization and why; specify the scaler and whether to fit on train only to avoid leakage. 2) Propose a principled approach for F1 when it has many zeros and missing values: imputation options, zero-inflated modeling, or transformations; justify how you will validate the choice. 3) With F4 and F5 strongly correlated (|r| > 0.9), describe three alternative strategies to select or transform features (e.g., VIF thresholding, L1-penalized model, PCA) and how to choose among them with cross-validation while keeping interpretability. 4) For three model families (linear/logistic with regularization, tree-based ensembles, and k-NN), specify exactly how your preprocessing differs and why scale and correlation matter differently. 5) Provide a leak-safe sklearn-style pipeline and cross-validation plan that evaluates these choices, including metrics, stratification, and how you would compare pipelines statistically.

Decide standardization, sparse numerics, correlated features

Overview

Comments (0)