PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates understanding of Python runtime semantics (mutable default arguments, in-place list mutation during iteration), debugging and unit-testing practices, virtual environment and packaging workflows, and CI/tooling considerations.

  • Medium
  • Capital One
  • Coding & Algorithms
  • Data Scientist

Debug and test a Python function in venv

Company: Capital One

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Onsite

Consider the Python snippet below. """ import math def accumulate(nums, start=0, cache={}): total = start for n in nums: if n % 2 == 0: nums.append(n//2) total += n cache[str(nums)] = total return total """ Tasks: 1) Precisely state what accumulate([2, 3], start=1) returns and why (walk through iteration order and mutations). Identify every bug/code smell (logical, algorithmic, and design) and their impact: list-mutation during iteration, default-mutable arguments, potential non-termination, unintended cache growth, time/space complexity, and keying strategy. 2) Provide a corrected, production-ready version (pure function + optional caching) with asymptotic complexity analysis and docstring that specifies pre/post-conditions. 3) Write minimal but thorough pytest unit tests (property-based or table-driven) covering edge cases (empty list, very large even chains, negatives, None, non-ints) and proving termination. 4) Show exact commands to: create/activate an isolated virtual environment; pin dependencies; run tests; and produce a wheel. Assume macOS/Linux. 5) Briefly explain how you would add CI (static type checks with mypy, linting, coverage thresholds) and package an entry-point CLI.

Quick Answer: This question evaluates understanding of Python runtime semantics (mutable default arguments, in-place list mutation during iteration), debugging and unit-testing practices, virtual environment and packaging workflows, and CI/tooling considerations.

Part 1: Simulate the buggy accumulate function and report the design issues

You are given the behavior of this buggy Python function: import math def accumulate(nums, start=0, cache={}): total = start for n in nums: if n % 2 == 0: nums.append(n // 2) total += n cache[str(nums)] = total return total For this problem, assume nums contains only integers and the cache is empty at the start of the analyzed call. Return a 4-tuple: 1) whether the call terminates, 2) the exact return value if it terminates, otherwise None, 3) the final mutated nums list if it terminates, otherwise None, 4) a fixed sorted list of issue codes describing the bugs/code smells in the function. Use Python's real list-iteration behavior: elements appended during iteration are visited later in the same loop. If execution would never terminate, return False for the first field.

Constraints

  • 0 <= len(nums) <= 10^5
  • -10^18 <= nums[i], start <= 10^18
  • nums contains only integers
  • If 0 appears in nums, the original function would not terminate

Examples

Input: ([2, 3], 1)

Expected Output: (True, 7, [2, 3, 1], ['default_mutable_cache', 'input_list_mutation_during_iteration', 'keying_by_stringified_mutable_list', 'potential_nontermination_when_zero_present', 'quadratic_time_from_repeated_stringify', 'unbounded_cache_growth_across_calls'])

Explanation: 2 is visited first, so 1 is appended. The loop then visits 3 and finally the appended 1. Total = 1 + 2 + 3 + 1 = 7.

Input: ([], 5)

Expected Output: (True, 5, [], ['default_mutable_cache', 'input_list_mutation_during_iteration', 'keying_by_stringified_mutable_list', 'potential_nontermination_when_zero_present', 'quadratic_time_from_repeated_stringify', 'unbounded_cache_growth_across_calls'])

Explanation: The loop body never runs, so the function returns start unchanged.

Input: ([4], 0)

Expected Output: (True, 7, [4, 2, 1], ['default_mutable_cache', 'input_list_mutation_during_iteration', 'keying_by_stringified_mutable_list', 'potential_nontermination_when_zero_present', 'quadratic_time_from_repeated_stringify', 'unbounded_cache_growth_across_calls'])

Explanation: 4 appends 2, then 2 appends 1, then 1 ends the chain. Total = 4 + 2 + 1 = 7.

Input: ([0], 2)

Expected Output: (False, None, None, ['default_mutable_cache', 'input_list_mutation_during_iteration', 'keying_by_stringified_mutable_list', 'potential_nontermination_when_zero_present', 'quadratic_time_from_repeated_stringify', 'unbounded_cache_growth_across_calls'])

Explanation: 0 is even and appends another 0 forever, so the loop never finishes.

Solution

def solution(nums, start):
    issues = [
        'default_mutable_cache',
        'input_list_mutation_during_iteration',
        'keying_by_stringified_mutable_list',
        'potential_nontermination_when_zero_present',
        'quadratic_time_from_repeated_stringify',
        'unbounded_cache_growth_across_calls',
    ]

    arr = list(nums)
    if 0 in arr:
        return (False, None, None, issues)

    total = start
    i = 0
    while i < len(arr):
        n = arr[i]
        if n % 2 == 0:
            arr.append(n // 2)
        total += n
        i += 1

    return (True, total, arr, issues)

Time complexity: O(n + s), where s is the number of appended half-values before termination. Space complexity: O(n + s).

Hints

  1. In Python, a for-loop over a list can visit items appended before the loop ends.
  2. Among integers, repeatedly applying n // 2 to an even number only loops forever for 0.

Part 2: Implement a pure, terminating accumulate with optional caching

Implement a corrected version of accumulate. For each integer n in nums, define its contribution as: - include n itself, - while the current value is nonzero and even, replace it with current // 2 and include that new value too. Examples: - 3 contributes 3 - 4 contributes 4 + 2 + 1 = 7 - -2 contributes -2 + -1 = -3 - 0 contributes 0 exactly once Return start plus the sum of all contributions. The function must be pure: it must not mutate nums. If use_cache is True, reuse previously computed contributions for repeated numbers within the same call. Return None for invalid input: nums is None, nums is not a list, start is not an int, use_cache is not a bool, or any element of nums is not a plain int (bool is invalid).

Constraints

  • 0 <= len(nums) <= 10^5 for valid cases
  • -10^18 <= nums[i], start <= 10^18 for valid cases
  • bool values are considered invalid even though bool is a subclass of int in Python

Examples

Input: ([2, 3], 1, False)

Expected Output: 7

Explanation: 2 contributes 2 + 1 = 3, 3 contributes 3, and start is 1.

Input: ([4, 4], 0, True)

Expected Output: 14

Explanation: Each 4 contributes 7, so the result is 14. Caching avoids recomputing the second chain.

Input: ([], 10, True)

Expected Output: 10

Explanation: Empty input returns start unchanged.

Input: ([0, -2], 5, True)

Expected Output: 2

Explanation: 0 contributes 0 once, and -2 contributes -3.

Input: (None, 0, False)

Expected Output: None

Explanation: nums is invalid.

Solution

def solution(nums, start=0, use_cache=False):
    """
    Return start plus the sum of each integer's even-halving chain.

    Preconditions:
    - nums must be a list of plain ints, or None for invalid input.
    - start must be a plain int.
    - use_cache must be a bool.

    Postconditions:
    - nums is not mutated.
    - Returns an int for valid input, otherwise None.
    - Always terminates; 0 contributes 0 exactly once.
    """
    if nums is None or not isinstance(nums, list):
        return None
    if type(start) is not int or type(use_cache) is not bool:
        return None
    for n in nums:
        if type(n) is not int:
            return None

    memo = {} if use_cache else None

    def contribution(n):
        if memo is not None and n in memo:
            return memo[n]

        total = 0
        cur = n
        while True:
            total += cur
            if cur == 0 or cur % 2 != 0:
                break
            cur //= 2

        if memo is not None:
            memo[n] = total
        return total

    total = start
    for n in nums:
        total += contribution(n)
    return total

Time complexity: Without caching: O(n log V). With caching: O(u log V + n), where u is the number of distinct values and V is the maximum absolute value.. Space complexity: O(1) without caching, O(u) with caching.

Hints

  1. Process each number independently; you do not need to append into the original list.
  2. A repeated number should have the same contribution every time, which makes memoization natural.

Part 3: Choose a minimal table-driven pytest suite

You are designing a minimal but thorough pytest suite for the corrected accumulate function. Each candidate test covers one or more behavior tags such as 'empty', 'chain', 'negative', 'none', 'nonint', and 'termination'. Given a list of candidate tests and a list of required tags, choose the smallest subset of test names whose combined tags cover every required tag. If multiple minimum-size subsets exist, return the lexicographically smallest sorted list of test names. If it is impossible to cover all required tags, return None.

Constraints

  • 1 <= number of candidates <= 20
  • 0 <= number of required tags <= 15
  • Candidate names are unique strings

Examples

Input: ([('test_empty', ['empty', 'termination']), ('test_chain', ['chain', 'termination']), ('test_neg', ['negative']), ('test_invalid', ['none', 'nonint']), ('test_mixed', ['negative', 'chain'])], ['empty', 'chain', 'negative', 'none', 'nonint', 'termination'])

Expected Output: ['test_empty', 'test_invalid', 'test_mixed']

Explanation: This is the unique minimum cover of size 3.

Input: ([('aa_empty_term', ['empty', 'termination']), ('ab_invalid_chain', ['none', 'nonint', 'chain']), ('ac_negative', ['negative']), ('ba_empty_chain', ['empty', 'chain']), ('bb_invalid_term', ['none', 'nonint', 'termination']), ('bc_negative', ['negative'])], ['empty', 'chain', 'negative', 'none', 'nonint', 'termination'])

Expected Output: ['aa_empty_term', 'ab_invalid_chain', 'ac_negative']

Explanation: There are two minimum covers of size 3; this one is lexicographically smaller.

Input: ([('test_empty', ['empty'])], ['empty', 'termination'])

Expected Output: None

Explanation: No candidate covers 'termination'.

Input: ([('anything', ['empty'])], [])

Expected Output: []

Explanation: No tags are required, so the empty suite is optimal.

Solution

def solution(candidates, required_tags):
    required = sorted(set(required_tags))
    if not required:
        return []

    tag_index = {tag: i for i, tag in enumerate(required)}
    processed = []
    for name, tags in candidates:
        mask = 0
        for tag in set(tags):
            if tag in tag_index:
                mask |= 1 << tag_index[tag]
        processed.append((name, mask))

    processed.sort(key=lambda x: x[0])
    full_mask = (1 << len(required)) - 1

    best = {0: ()}

    def better(a, b):
        if b is None:
            return True
        return (len(a), a) < (len(b), b)

    for name, mask in processed:
        snapshot = list(best.items())
        for old_mask, names in snapshot:
            new_mask = old_mask | mask
            new_names = names + (name,)
            prev = best.get(new_mask)
            if better(new_names, prev):
                best[new_mask] = new_names

    if full_mask not in best:
        return None
    return list(best[full_mask])

Time complexity: O(c * 2^r), where c is the number of candidates and r is the number of required tags. Space complexity: O(2^r).

Hints

  1. Map each required tag to a bit position, then each test becomes a bitmask.
  2. Dynamic programming over covered-tag masks can find the minimum-size subset with deterministic tie-breaking.

Part 4: Generate exact venv, dependency, test, and wheel build commands

Generate the exact macOS/Linux shell command sequence for a Python project. Input gives: - project_dir: project folder name - runtime_deps: list of (package_name, version) - dev_deps: list of (package_name, version) Return the ordered list of commands that: 1) enters the project directory, 2) creates a virtual environment named .venv, 3) activates it, 4) upgrades pip, 5) installs all pinned dependencies, 6) freezes the environment to requirements-lock.txt, 7) runs pytest quietly, 8) builds a wheel. Rules: - Always ensure pytest==8.2.2 and build==1.2.1 are installed. - Package names are deduplicated case-insensitively. - If the same package appears multiple times, keep the highest semantic version. - Sort installed packages alphabetically by normalized lowercase package name.

Constraints

  • 0 <= total dependency entries <= 200
  • Versions use numeric dot-separated semantic versioning such as 1.2.3
  • Package names are non-empty strings without spaces

Examples

Input: ("app", [('requests', '2.32.3')], [('pytest', '8.1.1')])

Expected Output: ['cd app', 'python3 -m venv .venv', 'source .venv/bin/activate', 'python -m pip install --upgrade pip', 'python -m pip install build==1.2.1 pytest==8.2.2 requests==2.32.3', 'python -m pip freeze > requirements-lock.txt', 'pytest -q', 'python -m build --wheel']

Explanation: pytest is bumped to the required pinned version, and build is added.

Input: ("proj", [('Requests', '2.31.0'), ('requests', '2.32.3')], [('build', '1.0.0'), ('flake8', '7.1.0')])

Expected Output: ['cd proj', 'python3 -m venv .venv', 'source .venv/bin/activate', 'python -m pip install --upgrade pip', 'python -m pip install build==1.2.1 flake8==7.1.0 pytest==8.2.2 requests==2.32.3', 'python -m pip freeze > requirements-lock.txt', 'pytest -q', 'python -m build --wheel']

Explanation: requests is deduplicated case-insensitively and the higher version is kept.

Input: ("emptyproj", [], [])

Expected Output: ['cd emptyproj', 'python3 -m venv .venv', 'source .venv/bin/activate', 'python -m pip install --upgrade pip', 'python -m pip install build==1.2.1 pytest==8.2.2', 'python -m pip freeze > requirements-lock.txt', 'pytest -q', 'python -m build --wheel']

Explanation: The mandatory build and pytest dependencies are still installed.

Input: ("svc", [('a', '1.2.1'), ('a', '1.2.0')], [('b', '1.10.0'), ('b', '1.9.9')])

Expected Output: ['cd svc', 'python3 -m venv .venv', 'source .venv/bin/activate', 'python -m pip install --upgrade pip', 'python -m pip install a==1.2.1 b==1.10.0 build==1.2.1 pytest==8.2.2', 'python -m pip freeze > requirements-lock.txt', 'pytest -q', 'python -m build --wheel']

Explanation: Semantic version comparison keeps 1.2.1 over 1.2.0 and 1.10.0 over 1.9.9.

Solution

def solution(project_dir, runtime_deps, dev_deps):
    def parse_version(version):
        return tuple(int(part) for part in version.split('.'))

    merged = {}
    all_deps = list(runtime_deps) + list(dev_deps) + [('pytest', '8.2.2'), ('build', '1.2.1')]

    for name, version in all_deps:
        norm = name.lower()
        if norm not in merged or parse_version(version) > parse_version(merged[norm]):
            merged[norm] = version

    pinned = ' '.join(f"{name}=={merged[name]}" for name in sorted(merged))

    return [
        f"cd {project_dir}",
        'python3 -m venv .venv',
        'source .venv/bin/activate',
        'python -m pip install --upgrade pip',
        f"python -m pip install {pinned}",
        'python -m pip freeze > requirements-lock.txt',
        'pytest -q',
        'python -m build --wheel',
    ]

Time complexity: O(d log d), where d is the number of dependency entries after normalization. Space complexity: O(d).

Hints

  1. Normalize package names to lowercase before deduplicating.
  2. Compare semantic versions component by component, not as raw strings.

Part 5: Build a deterministic CI plan and CLI entry-point mapping

You are configuring CI and packaging for a Python project. Given a list of requested features from {'lint', 'mypy', 'coverage', 'cli'}, produce: - an ordered list of CI/build commands, and - the CLI entry-point string if CLI packaging is requested. Rules: - If features is empty, return ([], None). - If an unknown feature appears, return None. - If 'coverage' is requested, coverage_threshold must be an integer in [0, 100]. - If 'cli' is requested, both cli_name and cli_target must be non-empty. Model the workflow as a DAG: - setup is required whenever any feature is enabled. - lint depends on setup. - mypy depends on setup. - test depends on setup and represents the coverage run. - build depends on setup and also on every enabled check among lint, mypy, and test. Use alphabetical order when multiple nodes are ready at the same time. Command mapping: - setup -> 'python -m pip install -e .[dev]' - lint -> 'ruff check .' - mypy -> 'mypy src' - test -> 'pytest --cov=src --cov-fail-under={coverage_threshold}' - build -> 'python -m build' If 'cli' is enabled, return entry point '{cli_name}={cli_target}', otherwise return None.

Constraints

  • 0 <= len(features) <= 4
  • features contains strings
  • coverage_threshold is only validated when 'coverage' is enabled

Examples

Input: (['lint', 'mypy', 'coverage', 'cli'], 90, 'accumulate', 'pkg.cli:main')

Expected Output: (['python -m pip install -e .[dev]', 'ruff check .', 'mypy src', 'pytest --cov=src --cov-fail-under=90', 'python -m build'], 'accumulate=pkg.cli:main')

Explanation: setup runs first, then lint/mypy/test in alphabetical order, and build runs last.

Input: (['cli'], 80, 'tool', 'app.main:run')

Expected Output: (['python -m pip install -e .[dev]', 'python -m build'], 'tool=app.main:run')

Explanation: Only setup and build are needed.

Input: ([], 95, None, None)

Expected Output: ([], None)

Explanation: No requested features means no CI steps are needed.

Input: (['coverage'], 120, None, None)

Expected Output: None

Explanation: Coverage threshold must be between 0 and 100 inclusive.

Solution

def solution(features, coverage_threshold, cli_name, cli_target):
    valid = {'lint', 'mypy', 'coverage', 'cli'}
    feature_set = set(features)

    if not feature_set.issubset(valid):
        return None
    if not feature_set:
        return ([], None)
    if 'coverage' in feature_set:
        if type(coverage_threshold) is not int or not (0 <= coverage_threshold <= 100):
            return None
    if 'cli' in feature_set:
        if not cli_name or not cli_target:
            return None

    deps = {'setup': set()}
    if 'lint' in feature_set:
        deps['lint'] = {'setup'}
    if 'mypy' in feature_set:
        deps['mypy'] = {'setup'}
    if 'coverage' in feature_set:
        deps['test'] = {'setup'}
    if 'cli' in feature_set:
        build_needs = {'setup'}
        for node in ('lint', 'mypy', 'test'):
            if node in deps:
                build_needs.add(node)
        deps['build'] = build_needs

    order = []
    done = set()
    while len(done) < len(deps):
        ready = sorted(node for node, need in deps.items() if node not in done and need <= done)
        if not ready:
            return None
        node = ready[0]
        done.add(node)
        order.append(node)

    commands = []
    for node in order:
        if node == 'setup':
            commands.append('python -m pip install -e .[dev]')
        elif node == 'lint':
            commands.append('ruff check .')
        elif node == 'mypy':
            commands.append('mypy src')
        elif node == 'test':
            commands.append(f'pytest --cov=src --cov-fail-under={coverage_threshold}')
        elif node == 'build':
            commands.append('python -m build')

    entry_point = f'{cli_name}={cli_target}' if 'cli' in feature_set else None
    return (commands, entry_point)

Time complexity: O(v^2 log v), where v is the number of workflow nodes. Space complexity: O(v + e).

Hints

  1. Treat each CI action as a graph node and use topological sorting.
  2. A deterministic tie-break rule is needed whenever several jobs become available together.
Last updated: Apr 25, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Solve Four Coding Assessment Tasks - Capital One (medium)
  • Write SQL using joins and window functions - Capital One (medium)
  • Review Preprocessing Code and Tests - Capital One (easy)
  • Remove nodes with a given value - Capital One (medium)
  • Solve multiple algorithmic interview questions - Capital One (hard)