Aggregate video time and unique pins in Python
Company: Pinterest
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
Part A (category by average time for videos):
You receive a list of pin engagement rows and a category map.
pins = [
{"pin_id": 10, "category_id": 1, "time_spent": 18.0, "pin_format": "video"},
{"pin_id": 11, "category_id": 2, "time_spent": 6.0, "pin_format": "static"},
{"pin_id": 12, "category_id": 1, "time_spent": 22.0, "pin_format": "video"},
{"pin_id": 13, "category_id": 3, "time_spent": 15.0, "pin_format": "video"},
{"pin_id": 14, "category_id": 2, "time_spent": None, "pin_format": "video"}
]
category_map = {1: "Fashion", 2: "Food", 3: "DIY"}
Task: Among video pins only, compute the category_name with the highest average time_spent. Ignore records where time_spent is None. Break ties by category_name alphabetically. Return (category_name, average_time) with average_time rounded to 2 decimals. Implement an O(n) solution using only Python 3.10+ standard library.
Part B (average unique pins per user):
You receive a mapping of user -> list of pin_ids they interacted with.
user_pins = {"user1": [1,2,2], "user2": [1,2,3], "user3": []}
Task: Compute the average number of unique pin_ids per user. Empty lists count as 0. Return a float rounded to 2 decimals. Aim for O(total_items) time and O(U) extra space, where U is number of users.
Quick Answer: This question evaluates competency in data aggregation, deduplication, and algorithmic efficiency by requiring computation of category-level averages with tie-breaking and per-user unique counts, demonstrating proficiency in Python and SQL-style data manipulation.