Review Preprocessing Code and Tests
Company: Capital One
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: easy
Interview Round: Onsite
Quick Answer: This question evaluates a candidate's competency in data-science engineering tasks including code review, data preprocessing techniques (outlier handling and imputation), reproducibility via virtual environments, and design and unit testing of data pipelines.
Part 1: Validate a Virtual Environment Script
Constraints
- 0 <= len(commands) <= 10^5
- Each command is a tuple of the form ('activate', env), ('install', package), ('run', job), or ('deactivate',)
- Environment, package, and job names are strings
- Installed packages persist inside an environment even after deactivation and later reactivation
Examples
Input: ([('activate', 'env1'), ('install', 'pandas'), ('install', 'numpy'), ('run', 'daily'), ('deactivate',), ('activate', 'env2'), ('install', 'numpy'), ('run', 'quick'), ('deactivate',), ('activate', 'env1'), ('run', 'daily')], {'daily': ['pandas', 'numpy'], 'quick': ['numpy']})
Expected Output: {'status': 'ok', 'runs': 3}
Explanation: env1 keeps its packages after being deactivated, so the final run succeeds too.