PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Capital One

Evaluate OutlierHandler Class for Code Quality and Testing

Last updated: Mar 29, 2026

Quick Overview

Evaluates code-review judgment for OutlierHandler and imputer preprocessing classes in ML pipelines. Strong answers explain fit-transform separation, leakage prevention, API quality, edge cases, and testing.

  • medium
  • Capital One
  • Machine Learning
  • Data Scientist

Evaluate OutlierHandler Class for Code Quality and Testing

Company: Capital One

Role: Data Scientist

Category: Machine Learning

Difficulty: medium

Interview Round: Onsite

##### Scenario Code-review of two Python preprocessing components: an OutlierHandler class and three Imputer classes ##### Question Give a high-level summary of what the OutlierHandler class does. Why is it beneficial to separate `fit` and `transform` into two methods in this context? Point out any coding-style or maintainability issues you observe in the file (naming, docstrings, magic numbers, etc.). Write the single most critical unit test you would add for the OutlierHandler. For the three imputation classes, describe their overall purpose and identify at least two style problems (e.g., use of `from numpy import *`). ##### Hints Relate your answers to the scikit-learn transformer API, unit-testing best practices, and PEP-8.

Quick Answer: Evaluates code-review judgment for OutlierHandler and imputer preprocessing classes in ML pipelines. Strong answers explain fit-transform separation, leakage prevention, API quality, edge cases, and testing.

Related Interview Questions

  • Deep-dive XGBoost handling and overfitting - Capital One (medium)
  • Build House Price Model Responsibly - Capital One (easy)
  • Design robber detection from surveillance video - Capital One (easy)
  • How would you design delay and watchlist models? - Capital One (medium)
  • Explain core ML concepts and lifecycle - Capital One (medium)
|Home/Machine Learning/Capital One

Evaluate OutlierHandler Class for Code Quality and Testing

Capital One logo
Capital One
Jul 12, 2025, 6:59 PM
mediumData ScientistOnsiteMachine Learning
65
0

Code Review: OutlierHandler and Imputer Classes

You are given a Python module that implements one OutlierHandler class and three Imputer classes for preprocessing tabular data. The classes appear intended for machine-learning pipelines, but the style and test coverage are mixed.

Assume OutlierHandler detects outliers per feature using rules such as IQR capping or z-score thresholds, and the imputer classes learn statistics during fit and fill missing values during transform.

Constraints & Assumptions

  • Treat the classes as stateful preprocessing components for train/validation/test pipelines.
  • Focus on code quality, API design, correctness, leakage prevention, and testing.
  • Do not assume access to production internals beyond the class behavior described.
  • Discuss both behavior and maintainability.

Clarifying Questions to Ask

  • Should the classes follow scikit-learn's estimator API exactly?
  • Are inputs NumPy arrays, pandas DataFrames, or both?
  • Should transforms preserve column names, dtypes, indexes, and missing-value markers?
  • Are outliers capped, removed, replaced, or flagged?

Part 1 - OutlierHandler Summary

Provide a high-level summary of what the OutlierHandler class does.

What This Part Should Cover

  • Explain that it learns per-feature thresholds during fit .
  • Explain that transform applies stored thresholds to new data consistently.
  • Mention strategies such as IQR, z-score, capping, masking, or replacement.
  • Connect the class to ML preprocessing pipelines.

Part 2 - Fit and Transform Separation

Explain why separating fit and transform into two methods matters.

What This Part Should Cover

  • Prevent data leakage by learning statistics only on training data.
  • Ensure validation, test, and production data are transformed consistently.
  • Support pipelines, cross-validation, serialization, and reproducibility.
  • Clarify behavior when transform is called before fit .

Part 3 - Code Quality and Testing

Evaluate the code quality and propose tests.

What This Part Should Cover

  • Review API consistency, input validation, error handling, documentation, naming, type handling, and edge cases.
  • Test missing values, constant columns, all-null columns, mixed dtypes, unseen categories, extreme values, small samples, and transform-before-fit errors.
  • Test shape preservation, no mutation of inputs, deterministic output, and parity across train/test.
  • Include unit tests and integration tests inside a simple ML pipeline.

Follow-up Questions

  • How would you make the classes compatible with scikit-learn pipelines?
  • What bug would you expect if thresholds are recomputed during transform?
  • How would you test the behavior on a DataFrame with nonnumeric columns?
Loading comments...

Browse More Questions

More Machine Learning•More Capital One•More Data Scientist•Capital One Data Scientist•Capital One Machine Learning•Data Scientist Machine Learning

Write your answer

Your first approved answer each day earns 20 XP.

Sign in to write your answer.
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.