PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Machine Learning/Capital One

Identify Risks and Improve Imputation Class Implementations

Last updated: Mar 29, 2026

Quick Overview

This question evaluates proficiency in data preprocessing and engineering for machine learning, focusing on imputation strategies, correct handling of numeric, categorical, boolean and datetime dtypes, sparse data support, sklearn transformer API compliance, and robustness to edge cases.

  • medium
  • Capital One
  • Machine Learning
  • Data Scientist

Identify Risks and Improve Imputation Class Implementations

Company: Capital One

Role: Data Scientist

Category: Machine Learning

Difficulty: medium

Interview Round: Onsite

##### Scenario Tech round code-review: three imputation classes implementing mean, median, and mode substitution for missing values ##### Question Identify any problems or risks you notice in these imputation class implementations. Suggest concrete improvements or refactors to make them more robust and reusable. ##### Hints Consider inheritance, dtype handling, sparse data, incremental fit, edge cases, and compliance with sklearn interface.

Quick Answer: This question evaluates proficiency in data preprocessing and engineering for machine learning, focusing on imputation strategies, correct handling of numeric, categorical, boolean and datetime dtypes, sparse data support, sklearn transformer API compliance, and robustness to edge cases.

Related Interview Questions

  • Deep-dive XGBoost handling and overfitting - Capital One (medium)
  • Build House Price Model Responsibly - Capital One (easy)
  • Design robber detection from surveillance video - Capital One (easy)
  • How would you design delay and watchlist models? - Capital One (medium)
  • Explain core ML concepts and lifecycle - Capital One (medium)
Capital One logo
Capital One
Aug 4, 2025, 10:55 AM
Data Scientist
Onsite
Machine Learning
2
0

Scenario

You are reviewing three custom Python imputation classes intended for use in a scikit-learn workflow. Each class fills missing values column-wise using one of the following strategies: mean, median, or mode.

Assume these classes are meant to be sklearn-compatible transformers used within pipelines (fit on train, transform on validation/test) and may be applied to numpy arrays, pandas DataFrames, or sparse matrices.

Task

  1. Identify potential problems or risks in these mean/median/mode imputer implementations.
  2. Propose concrete improvements or refactors to make them robust, reusable, and compliant with the sklearn interface.

Hints

Consider: inheritance and API compliance, dtype handling (numeric, boolean, categorical, datetime), sparse data, incremental/streaming fit, edge cases (all-missing columns, ties for mode), performance, and testability.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Capital One•More Data Scientist•Capital One Data Scientist•Capital One Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.