Identify Risks and Improve Imputation Class Implementations

Q: Identify Risks and Improve Imputation Class Implementations

This question evaluates proficiency in data preprocessing and engineering for machine learning, focusing on imputation strategies, correct handling of numeric, categorical, boolean and datetime dtypes, sparse data support, sklearn transformer API compliance, and robustness to edge cases.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Scenario

You are reviewing three custom Python imputation classes intended for use in a scikit-learn workflow. Each class fills missing values column-wise using one of the following strategies: mean, median, or mode.

Assume these classes are meant to be sklearn-compatible transformers used within pipelines (fit on train, transform on validation/test) and may be applied to numpy arrays, pandas DataFrames, or sparse matrices.

Task

Identify potential problems or risks in these mean/median/mode imputer implementations.
Propose concrete improvements or refactors to make them robust, reusable, and compliant with the sklearn interface.

Hints

Consider: inheritance and API compliance, dtype handling (numeric, boolean, categorical, datetime), sparse data, incremental/streaming fit, edge cases (all-missing columns, ties for mode), performance, and testability.

Identify Risks and Improve Imputation Class Implementations

Scenario

Task

Hints

Solution

Comments (0)

Identify Risks and Improve Imputation Class Implementations

Overview

Scenario

Task

Hints

Solution

Comments (0)