Code Review and Refactor: Summing a CSV Column
Context
You are reviewing a short Python script that sums a numeric column from a CSV using pandas. Your tasks are to identify problems, refactor into a small, testable module, add tests, define an environment, and explain design choices and complexity trade-offs.
Given Script
# script.py
import pandas as pd
DATA_PATH = 'data.csv'
result = None
def compute_total(col):
df = pd.read_csv(DATA_PATH)
total = 0
for x in df[col]:
if x == '':
total += 0
else:
total += float(x)
print(total)
compute_total('amount')
Tasks
-
Identify at least five defects or risks (correctness, performance, readability, resource management, security).
-
Refactor into a small, testable module with clear interfaces, type hints, and no global state; include input validation and assert-based precondition checks. Explain when assertions vs. exceptions are appropriate.
-
Write three pytest-style unit tests using assert statements that cover:
-
Normal inputs
-
Missing/NaN inputs
-
Malformed inputs
-
Provide an environment.yml for a Conda environment (Python 3.11, pinned dependencies) and the exact commands to create/activate it.
-
Explain the benefits of modularization for maintainability, dependency management, and testability.
-
State the time/space complexity before and after refactoring and any I/O bottlenecks you would address.