Clean and Summarize User Purchase Data Efficiently
Company: PayPal
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Onsite
transactions
+-----------+---------------------+-----------+--------+
| user_id | txn_timestamp | txn_value | txn_id |
+-----------+---------------------+-----------+--------+
| 201 | 2023-09-01 09:00:00 | 15.50 | 1 |
| 201 | 2023-09-01 09:05:00 | 17.00 | 2 |
| 202 | 2023-09-01 10:00:00 | 20.00 | 3 |
| 201 | 2023-09-02 11:00:00 | 5.00 | 4 |
| 203 | 2023-09-02 11:05:00 | 22.00 | 5 |
+-----------+---------------------+-----------+--------+
##### Scenario
You have a pandas DataFrame with user purchase history and need to clean and summarize it for analysts.
##### Question
a) Implement a function that removes every user who has fewer than 100 transactions in any calendar month. b) Implement a second function that returns the average time between consecutive transactions, in seconds, for each remaining user.
##### Hints
Group by user & month; use shift to compute timedeltas; convert Timedelta to seconds via .dt.total_seconds().
Quick Answer: This question evaluates proficiency in data cleaning, aggregation, and time-series manipulation within the Data Manipulation (SQL/Python) domain, focusing on filtering by group counts and calculating inter-event timing metrics.