Clean and Aggregate Transactions for Finance Dashboard
Company: Pinterest
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Onsite
transactions
id | user_id | amount | timestamp | category
1 | 1001 | 19.99 | 2023-01-01 09:00:00 | grocery
2 | 1001 | 5.50 | 2023-01-02 11:12:30 | coffee
3 | 1002 | 45.00 | 2023-01-03 14:45:10 | electronics
4 | 1003 | 12.00 | 2023-01-01 08:30:25 | transport
##### Scenario
You have a daily transactions dataset that must be cleaned and summarized for a finance dashboard.
##### Question
Using pandas, apply a lambda function to create a new column flagging high-value purchases (amount > $
40).
Build a dictionary that maps the raw text in the category column to standardized names (e.g., 'grocery', 'food & bev', 'coffee' -> 'Food'). Loop through this dictionary to transform the DataFrame.
Aggregate total spend and number of transactions per user_id and return a tidy DataFrame.
##### Hints
Show vectorized pandas code; avoid row-by-row loops except for the dictionary mapping step.
Quick Answer: This question evaluates data cleaning and transformation competencies including pandas-based vectorized operations, categorical mapping and feature flagging, as well as aggregation and summarization of transactional data in the data manipulation (SQL/Python) domain.