Calculate Total Spend and Identify Key User Metrics
Company: Yahoo
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
Transactions
+----+---------+--------+------------+
| id | user_id | amount | date |
+----+---------+--------+------------+
| 1 | 101 | 23.50 | 2023-07-01 |
| 2 | 102 | 12.99 | 2023-07-01 |
| 3 | 101 | 50.00 | 2023-07-02 |
| 4 | 103 | 99.95 | 2023-07-02 |
| 5 | 101 | 9.99 | 2023-07-03 |
+----+---------+--------+------------+
##### Scenario
You have one table that stores all monetary transactions made by users of an e-commerce site.
##### Question
For every user, return total_spend rounded to two decimals. 2. Return the user_id(s) that have the second-highest total spend. 3. List users whose transaction_count exceeds the average transaction_count of all users. 4. For each calendar date, calculate revenue_drop_pct compared with the previous day and keep only dates where the drop is ≥10%. 5. For every user, return first_purchase_date together with amount spent on that first day.
##### Hints
Window functions, aggregation, self-joins, DATE arithmetic and ROUND will be useful.
Quick Answer: This question evaluates a data scientist's proficiency in transactional data manipulation using SQL and Python, focusing on aggregation, ranking, joins, date arithmetic, and numeric rounding to produce user-level and time-series metrics.