PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCareers

Quick Overview

This question evaluates familiarity with shell scripting for reproducible data-science workflows, Python object-oriented design including transformer patterns (fit/transform separation), outlier detection and imputation strategies, code-style critique, and unit testing competency.

  • Medium
  • Capital One
  • Coding & Algorithms
  • Data Scientist

Explain Shell Script Line-by-Line for Data Science Workflows

Company: Capital One

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

##### Scenario Technical screening for a Principal Data Scientist: reviewing shell script and Python classes ##### Question Explain, line by line, what the provided virtual-environment shell script does. What advantages does shell scripting offer in data-science engineering workflows? Given the OutlierHandler class, describe its overall purpose. Why is separating fit() and transform() methods beneficial in a transformer class? Point out any coding-style or design issues you see in the class. Write one high-impact unit test you would add for OutlierHandler. For the three imputation classes shown, summarize their high-level functionality. Identify and justify any coding-style problems in the imputation script (e.g., use of "from numpy import *"). ##### Hints Focus on readability, testability, and reproducibility. Think about modular design and unit testing.

Quick Answer: This question evaluates familiarity with shell scripting for reproducible data-science workflows, Python object-oriented design including transformer patterns (fit/transform separation), outlier detection and imputation strategies, code-style critique, and unit testing competency.

You are given a list of operation strings representing method calls on components in a data-science pipeline. Each operation has the format "Component.method" where Component uses only letters, digits, or underscores, and method is one of: fit, transform, fit_transform, reset. A component must be fit before any transform on it. The method fit_transform is equivalent to performing fit then transform for that component. The method reset clears the fitted state of that component. Multiple fits are allowed and keep the component fitted. Return True if the entire sequence is valid under these rules; return False if any operation violates the rules, the format is invalid, or the method is unknown.

Constraints

  • 0 <= len(ops) <= 100000
  • Each op is a non-empty string
  • Format must be exactly "Component.method" with one dot
  • Component characters allowed: [A-Za-z0-9_]
  • Allowed methods: fit, transform, fit_transform, reset
  • Time complexity should be O(n)
  • Space complexity should be O(k) where k is the number of distinct components

Solution

def validate_sequence(ops: list[str]) -> bool:
    allowed = {"fit", "transform", "fit_transform", "reset"}
    fitted_state: dict[str, bool] = {}

    for raw in ops:
        op = raw.strip()
        # Must contain exactly one dot separating component and method
        if op.count('.') != 1:
            return False
        comp, method = op.split('.', 1)
        if not comp or not method:
            return False
        # Component name validation: only alphanumeric and underscores
        if not all(c.isalnum() or c == '_' for c in comp):
            return False
        if method not in allowed:
            return False

        is_fitted = fitted_state.get(comp, False)

        if method == 'fit':
            fitted_state[comp] = True
        elif method == 'fit_transform':
            # Equivalent to fit then transform; leaves component fitted
            fitted_state[comp] = True
        elif method == 'transform':
            if not is_fitted:
                return False
            # No state change
        else:  # 'reset'
            fitted_state[comp] = False

    return True
Explanation
Maintain a hash map from component name to a boolean indicating whether it is currently fitted. Parse each operation, validating format and allowed methods. For fit and fit_transform, mark the component as fitted. For transform, ensure the component is fitted; otherwise return False. For reset, clear the fitted state. If any operation is malformed or violates the rules, return False; otherwise return True after processing all operations.

Time complexity: O(n). Space complexity: O(k).

Hints

  1. Track per-component fitted state in a dictionary.
  2. Treat fit_transform as performing both fit and transform in one step.
  3. Reset should clear a component’s fitted state.
  4. Reject any operation that does not match the exact format or contains an unknown method.
Last updated: Mar 29, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • Careers
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Solve Four Coding Assessment Tasks - Capital One (medium)
  • Write SQL using joins and window functions - Capital One (medium)
  • Review Preprocessing Code and Tests - Capital One (easy)
  • Remove nodes with a given value - Capital One (medium)
  • Solve multiple algorithmic interview questions - Capital One (hard)