Debug and test a Python function in venv
Company: Capital One
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Onsite
Quick Answer: This question evaluates understanding of Python runtime semantics (mutable default arguments, in-place list mutation during iteration), debugging and unit-testing practices, virtual environment and packaging workflows, and CI/tooling considerations.
Part 1: Simulate the buggy accumulate function and report the design issues
Constraints
- 0 <= len(nums) <= 10^5
- -10^18 <= nums[i], start <= 10^18
- nums contains only integers
- If 0 appears in nums, the original function would not terminate
Examples
Input: ([2, 3], 1)
Expected Output: (True, 7, [2, 3, 1], ['default_mutable_cache', 'input_list_mutation_during_iteration', 'keying_by_stringified_mutable_list', 'potential_nontermination_when_zero_present', 'quadratic_time_from_repeated_stringify', 'unbounded_cache_growth_across_calls'])
Explanation: 2 is visited first, so 1 is appended. The loop then visits 3 and finally the appended 1. Total = 1 + 2 + 3 + 1 = 7.
Input: ([], 5)
Expected Output: (True, 5, [], ['default_mutable_cache', 'input_list_mutation_during_iteration', 'keying_by_stringified_mutable_list', 'potential_nontermination_when_zero_present', 'quadratic_time_from_repeated_stringify', 'unbounded_cache_growth_across_calls'])
Explanation: The loop body never runs, so the function returns start unchanged.
Input: ([4], 0)
Expected Output: (True, 7, [4, 2, 1], ['default_mutable_cache', 'input_list_mutation_during_iteration', 'keying_by_stringified_mutable_list', 'potential_nontermination_when_zero_present', 'quadratic_time_from_repeated_stringify', 'unbounded_cache_growth_across_calls'])
Explanation: 4 appends 2, then 2 appends 1, then 1 ends the chain. Total = 4 + 2 + 1 = 7.
Input: ([0], 2)
Expected Output: (False, None, None, ['default_mutable_cache', 'input_list_mutation_during_iteration', 'keying_by_stringified_mutable_list', 'potential_nontermination_when_zero_present', 'quadratic_time_from_repeated_stringify', 'unbounded_cache_growth_across_calls'])
Explanation: 0 is even and appends another 0 forever, so the loop never finishes.
Solution
def solution(nums, start):
issues = [
'default_mutable_cache',
'input_list_mutation_during_iteration',
'keying_by_stringified_mutable_list',
'potential_nontermination_when_zero_present',
'quadratic_time_from_repeated_stringify',
'unbounded_cache_growth_across_calls',
]
arr = list(nums)
if 0 in arr:
return (False, None, None, issues)
total = start
i = 0
while i < len(arr):
n = arr[i]
if n % 2 == 0:
arr.append(n // 2)
total += n
i += 1
return (True, total, arr, issues)
Time complexity: O(n + s), where s is the number of appended half-values before termination. Space complexity: O(n + s).
Hints
- In Python, a for-loop over a list can visit items appended before the loop ends.
- Among integers, repeatedly applying n // 2 to an even number only loops forever for 0.
Part 2: Implement a pure, terminating accumulate with optional caching
Constraints
- 0 <= len(nums) <= 10^5 for valid cases
- -10^18 <= nums[i], start <= 10^18 for valid cases
- bool values are considered invalid even though bool is a subclass of int in Python
Examples
Input: ([2, 3], 1, False)
Expected Output: 7
Explanation: 2 contributes 2 + 1 = 3, 3 contributes 3, and start is 1.
Input: ([4, 4], 0, True)
Expected Output: 14
Explanation: Each 4 contributes 7, so the result is 14. Caching avoids recomputing the second chain.
Input: ([], 10, True)
Expected Output: 10
Explanation: Empty input returns start unchanged.
Input: ([0, -2], 5, True)
Expected Output: 2
Explanation: 0 contributes 0 once, and -2 contributes -3.
Input: (None, 0, False)
Expected Output: None
Explanation: nums is invalid.
Solution
def solution(nums, start=0, use_cache=False):
"""
Return start plus the sum of each integer's even-halving chain.
Preconditions:
- nums must be a list of plain ints, or None for invalid input.
- start must be a plain int.
- use_cache must be a bool.
Postconditions:
- nums is not mutated.
- Returns an int for valid input, otherwise None.
- Always terminates; 0 contributes 0 exactly once.
"""
if nums is None or not isinstance(nums, list):
return None
if type(start) is not int or type(use_cache) is not bool:
return None
for n in nums:
if type(n) is not int:
return None
memo = {} if use_cache else None
def contribution(n):
if memo is not None and n in memo:
return memo[n]
total = 0
cur = n
while True:
total += cur
if cur == 0 or cur % 2 != 0:
break
cur //= 2
if memo is not None:
memo[n] = total
return total
total = start
for n in nums:
total += contribution(n)
return total
Time complexity: Without caching: O(n log V). With caching: O(u log V + n), where u is the number of distinct values and V is the maximum absolute value.. Space complexity: O(1) without caching, O(u) with caching.
Hints
- Process each number independently; you do not need to append into the original list.
- A repeated number should have the same contribution every time, which makes memoization natural.
Part 3: Choose a minimal table-driven pytest suite
Constraints
- 1 <= number of candidates <= 20
- 0 <= number of required tags <= 15
- Candidate names are unique strings
Examples
Input: ([('test_empty', ['empty', 'termination']), ('test_chain', ['chain', 'termination']), ('test_neg', ['negative']), ('test_invalid', ['none', 'nonint']), ('test_mixed', ['negative', 'chain'])], ['empty', 'chain', 'negative', 'none', 'nonint', 'termination'])
Expected Output: ['test_empty', 'test_invalid', 'test_mixed']
Explanation: This is the unique minimum cover of size 3.
Input: ([('aa_empty_term', ['empty', 'termination']), ('ab_invalid_chain', ['none', 'nonint', 'chain']), ('ac_negative', ['negative']), ('ba_empty_chain', ['empty', 'chain']), ('bb_invalid_term', ['none', 'nonint', 'termination']), ('bc_negative', ['negative'])], ['empty', 'chain', 'negative', 'none', 'nonint', 'termination'])
Expected Output: ['aa_empty_term', 'ab_invalid_chain', 'ac_negative']
Explanation: There are two minimum covers of size 3; this one is lexicographically smaller.
Input: ([('test_empty', ['empty'])], ['empty', 'termination'])
Expected Output: None
Explanation: No candidate covers 'termination'.
Input: ([('anything', ['empty'])], [])
Expected Output: []
Explanation: No tags are required, so the empty suite is optimal.
Solution
def solution(candidates, required_tags):
required = sorted(set(required_tags))
if not required:
return []
tag_index = {tag: i for i, tag in enumerate(required)}
processed = []
for name, tags in candidates:
mask = 0
for tag in set(tags):
if tag in tag_index:
mask |= 1 << tag_index[tag]
processed.append((name, mask))
processed.sort(key=lambda x: x[0])
full_mask = (1 << len(required)) - 1
best = {0: ()}
def better(a, b):
if b is None:
return True
return (len(a), a) < (len(b), b)
for name, mask in processed:
snapshot = list(best.items())
for old_mask, names in snapshot:
new_mask = old_mask | mask
new_names = names + (name,)
prev = best.get(new_mask)
if better(new_names, prev):
best[new_mask] = new_names
if full_mask not in best:
return None
return list(best[full_mask])
Time complexity: O(c * 2^r), where c is the number of candidates and r is the number of required tags. Space complexity: O(2^r).
Hints
- Map each required tag to a bit position, then each test becomes a bitmask.
- Dynamic programming over covered-tag masks can find the minimum-size subset with deterministic tie-breaking.
Part 4: Generate exact venv, dependency, test, and wheel build commands
Constraints
- 0 <= total dependency entries <= 200
- Versions use numeric dot-separated semantic versioning such as 1.2.3
- Package names are non-empty strings without spaces
Examples
Input: ("app", [('requests', '2.32.3')], [('pytest', '8.1.1')])
Expected Output: ['cd app', 'python3 -m venv .venv', 'source .venv/bin/activate', 'python -m pip install --upgrade pip', 'python -m pip install build==1.2.1 pytest==8.2.2 requests==2.32.3', 'python -m pip freeze > requirements-lock.txt', 'pytest -q', 'python -m build --wheel']
Explanation: pytest is bumped to the required pinned version, and build is added.
Input: ("proj", [('Requests', '2.31.0'), ('requests', '2.32.3')], [('build', '1.0.0'), ('flake8', '7.1.0')])
Expected Output: ['cd proj', 'python3 -m venv .venv', 'source .venv/bin/activate', 'python -m pip install --upgrade pip', 'python -m pip install build==1.2.1 flake8==7.1.0 pytest==8.2.2 requests==2.32.3', 'python -m pip freeze > requirements-lock.txt', 'pytest -q', 'python -m build --wheel']
Explanation: requests is deduplicated case-insensitively and the higher version is kept.
Input: ("emptyproj", [], [])
Expected Output: ['cd emptyproj', 'python3 -m venv .venv', 'source .venv/bin/activate', 'python -m pip install --upgrade pip', 'python -m pip install build==1.2.1 pytest==8.2.2', 'python -m pip freeze > requirements-lock.txt', 'pytest -q', 'python -m build --wheel']
Explanation: The mandatory build and pytest dependencies are still installed.
Input: ("svc", [('a', '1.2.1'), ('a', '1.2.0')], [('b', '1.10.0'), ('b', '1.9.9')])
Expected Output: ['cd svc', 'python3 -m venv .venv', 'source .venv/bin/activate', 'python -m pip install --upgrade pip', 'python -m pip install a==1.2.1 b==1.10.0 build==1.2.1 pytest==8.2.2', 'python -m pip freeze > requirements-lock.txt', 'pytest -q', 'python -m build --wheel']
Explanation: Semantic version comparison keeps 1.2.1 over 1.2.0 and 1.10.0 over 1.9.9.
Solution
def solution(project_dir, runtime_deps, dev_deps):
def parse_version(version):
return tuple(int(part) for part in version.split('.'))
merged = {}
all_deps = list(runtime_deps) + list(dev_deps) + [('pytest', '8.2.2'), ('build', '1.2.1')]
for name, version in all_deps:
norm = name.lower()
if norm not in merged or parse_version(version) > parse_version(merged[norm]):
merged[norm] = version
pinned = ' '.join(f"{name}=={merged[name]}" for name in sorted(merged))
return [
f"cd {project_dir}",
'python3 -m venv .venv',
'source .venv/bin/activate',
'python -m pip install --upgrade pip',
f"python -m pip install {pinned}",
'python -m pip freeze > requirements-lock.txt',
'pytest -q',
'python -m build --wheel',
]
Time complexity: O(d log d), where d is the number of dependency entries after normalization. Space complexity: O(d).
Hints
- Normalize package names to lowercase before deduplicating.
- Compare semantic versions component by component, not as raw strings.
Part 5: Build a deterministic CI plan and CLI entry-point mapping
Constraints
- 0 <= len(features) <= 4
- features contains strings
- coverage_threshold is only validated when 'coverage' is enabled
Examples
Input: (['lint', 'mypy', 'coverage', 'cli'], 90, 'accumulate', 'pkg.cli:main')
Expected Output: (['python -m pip install -e .[dev]', 'ruff check .', 'mypy src', 'pytest --cov=src --cov-fail-under=90', 'python -m build'], 'accumulate=pkg.cli:main')
Explanation: setup runs first, then lint/mypy/test in alphabetical order, and build runs last.
Input: (['cli'], 80, 'tool', 'app.main:run')
Expected Output: (['python -m pip install -e .[dev]', 'python -m build'], 'tool=app.main:run')
Explanation: Only setup and build are needed.
Input: ([], 95, None, None)
Expected Output: ([], None)
Explanation: No requested features means no CI steps are needed.
Input: (['coverage'], 120, None, None)
Expected Output: None
Explanation: Coverage threshold must be between 0 and 100 inclusive.
Solution
def solution(features, coverage_threshold, cli_name, cli_target):
valid = {'lint', 'mypy', 'coverage', 'cli'}
feature_set = set(features)
if not feature_set.issubset(valid):
return None
if not feature_set:
return ([], None)
if 'coverage' in feature_set:
if type(coverage_threshold) is not int or not (0 <= coverage_threshold <= 100):
return None
if 'cli' in feature_set:
if not cli_name or not cli_target:
return None
deps = {'setup': set()}
if 'lint' in feature_set:
deps['lint'] = {'setup'}
if 'mypy' in feature_set:
deps['mypy'] = {'setup'}
if 'coverage' in feature_set:
deps['test'] = {'setup'}
if 'cli' in feature_set:
build_needs = {'setup'}
for node in ('lint', 'mypy', 'test'):
if node in deps:
build_needs.add(node)
deps['build'] = build_needs
order = []
done = set()
while len(done) < len(deps):
ready = sorted(node for node, need in deps.items() if node not in done and need <= done)
if not ready:
return None
node = ready[0]
done.add(node)
order.append(node)
commands = []
for node in order:
if node == 'setup':
commands.append('python -m pip install -e .[dev]')
elif node == 'lint':
commands.append('ruff check .')
elif node == 'mypy':
commands.append('mypy src')
elif node == 'test':
commands.append(f'pytest --cov=src --cov-fail-under={coverage_threshold}')
elif node == 'build':
commands.append('python -m build')
entry_point = f'{cli_name}={cli_target}' if 'cli' in feature_set else None
return (commands, entry_point)
Time complexity: O(v^2 log v), where v is the number of workflow nodes. Space complexity: O(v + e).
Hints
- Treat each CI action as a graph node and use topological sorting.
- A deterministic tie-break rule is needed whenever several jobs become available together.