Explain Shell Script Line-by-Line for Data Science Workflows
Company: Capital One
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
Quick Answer: This question evaluates familiarity with shell scripting for reproducible data-science workflows, Python object-oriented design including transformer patterns (fit/transform separation), outlier detection and imputation strategies, code-style critique, and unit testing competency.
Constraints
- 0 <= len(ops) <= 100000
- Each op is a non-empty string
- Format must be exactly "Component.method" with one dot
- Component characters allowed: [A-Za-z0-9_]
- Allowed methods: fit, transform, fit_transform, reset
- Time complexity should be O(n)
- Space complexity should be O(k) where k is the number of distinct components
Solution
def validate_sequence(ops: list[str]) -> bool:
allowed = {"fit", "transform", "fit_transform", "reset"}
fitted_state: dict[str, bool] = {}
for raw in ops:
op = raw.strip()
# Must contain exactly one dot separating component and method
if op.count('.') != 1:
return False
comp, method = op.split('.', 1)
if not comp or not method:
return False
# Component name validation: only alphanumeric and underscores
if not all(c.isalnum() or c == '_' for c in comp):
return False
if method not in allowed:
return False
is_fitted = fitted_state.get(comp, False)
if method == 'fit':
fitted_state[comp] = True
elif method == 'fit_transform':
# Equivalent to fit then transform; leaves component fitted
fitted_state[comp] = True
elif method == 'transform':
if not is_fitted:
return False
# No state change
else: # 'reset'
fitted_state[comp] = False
return True
Explanation
Time complexity: O(n). Space complexity: O(k).
Hints
- Track per-component fitted state in a dictionary.
- Treat fit_transform as performing both fit and transform in one step.
- Reset should clear a component’s fitted state.
- Reject any operation that does not match the exact format or contains an unknown method.