This question evaluates proficiency in data preprocessing and engineering for machine learning, focusing on imputation strategies, correct handling of numeric, categorical, boolean and datetime dtypes, sparse data support, sklearn transformer API compliance, and robustness to edge cases.

You are reviewing three custom Python imputation classes intended for use in a scikit-learn workflow. Each class fills missing values column-wise using one of the following strategies: mean, median, or mode.
Assume these classes are meant to be sklearn-compatible transformers used within pipelines (fit on train, transform on validation/test) and may be applied to numpy arrays, pandas DataFrames, or sparse matrices.
Consider: inheritance and API compliance, dtype handling (numeric, boolean, categorical, datetime), sparse data, incremental/streaming fit, edge cases (all-missing columns, ties for mode), performance, and testability.
Login required