Pre-process Financial Data for Linear Regression Modeling
Company: Voleon Group
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
market_data
+------------+----------+----------+--------+
| date | feature1 | feature2 | target |
+------------+----------+----------+--------+
| 2024-01-02 | 1.23 | 0.34 | 0.05 |
| 2024-01-03 | 1.25 | 0.30 | -0.01 |
| 2024-01-04 | 1.20 | 0.28 | 0.02 |
| 2024-01-05 | 1.18 | 0.27 | -0.03 |
+------------+----------+----------+--------+
##### Scenario
Voleon DS tech round: pre-processing a financial time-series dataset before modeling.
##### Question
Using the table below, write Python/Pandas code to clean nulls, winsorize extreme values at the 1st/99th percentiles, standardize predictors, and create an X, y pair ready for linear regression.
##### Hints
Focus on dataframe operations: dropna, clip, StandardScaler, and separate features/target.
Quick Answer: This question evaluates data preprocessing and feature engineering skills—handling missing values, winsorizing outliers, standardizing predictors, and assembling X/y datasets using Python/Pandas within the Data Manipulation (SQL/Python) domain.