Visualize and Clean SKU Sales Data for Outliers
Company: Boston Consulting Group
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
sales_data
+------------+--------+-----------+----------+------------+---------+
| date | sku_id | unit_sold | revenue | promo_flag | store_id|
+------------+--------+-----------+----------+------------+---------+
| 2023-01-01 | A123 | 120 | 2400.00 | 1 | S01 |
| 2023-01-02 | A123 | 80 | 1600.00 | 0 | S01 |
| 2023-01-01 | B456 | 200 | 3000.00 | 1 | S02 |
| 2023-01-02 | B456 | 50 | 750.00 | 0 | S02 |
##### Scenario
Codesignal live-coding: analyst receives raw daily SKU sales data and must explore it visually while cleaning extreme values.
##### Question
Using Python (pandas, matplotlib/seaborn), draw a histogram of daily revenue per SKU and identify outliers with the IQR or Z-score method. Remove the detected outliers and re-plot the cleaned distribution.
##### Hints
Focus on reproducible pandas pipeline: load → aggregate → detect outliers → filter → visualize before/after.
Quick Answer: This question evaluates data manipulation and exploratory data analysis skills, including aggregation, outlier detection (IQR/Z-score) and visualization using Python libraries such as pandas and matplotlib/seaborn.