Find top category by video time spent
Company: Pinterest
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
Pandas required. You are given a DataFrame df with columns: user_id (int), pin_id (int), pin_type (str), category (str or None), time_spent_sec (numeric). Goal: among video pins, find the canonical category with the highest average time_spent_sec. Requirements:
- Consider pin_type values case-insensitively and treat 'vedio' as 'video' (data quality issue).
- Normalize category by lowercasing and stripping whitespace, then map via category_map; if a key is missing after normalization or category is null/empty, map to 'unknown'.
- Exclude rows where time_spent_sec is null or non-positive.
- Return a two-field result: top_category (str), avg_time_spent_sec (float, rounded to 2 decimals).
Example inputs
category_map = {
'home': 'lifestyle',
'food & drink': 'food',
'recipe': 'food',
'travel': 'travel'
}
df (illustrative rows)
user_id | pin_id | pin_type | category | time_spent_sec
1 | 10 | 'video' | 'Food & Drink' | 120
2 | 11 | 'video' | None | 200
3 | 12 | 'static' | 'Home' | 90
4 | 13 | 'video' | 'Recipe' | 240
5 | 14 | 'video' | 'DIY' | 180
6 | 15 | 'vedio' | 'food & drink ' | 60
What is the top_category and its average time among video pins after mapping and cleaning?
Quick Answer: This question evaluates proficiency in pandas-based data manipulation and aggregation, testing competencies in data cleaning, normalization, mapping, filtering, and computing summary statistics.