You are reviewing a small Python preprocessing codebase during an interview. You do not need to write code.
Part A: Environment and execution
A shell script activates a Python virtual environment and runs a data-processing job. Explain what such a script is typically doing and why isolated virtual environments are useful in collaborative analytics work.
Part B: Outlier processor
You are shown an OutlierProcessor class with three methods:
-
input_check(df, columns)
: validates inputs,
-
fit(df)
: computes lower and upper percentile cutoffs for selected columns,
-
transform(df)
: truncates values outside those cutoffs.
Explain how this class should work, what failure cases you would look for, and what unit tests you would add.
Part C: Imputer review
You are shown a messy Imputer class that implements mean, median, and mode filling strategies. What design, readability, and reliability improvements would you recommend?