User Activity Analytics: Unique Users per Activity and Average Time on Site
Company: Google
Role: Data Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
Quick Answer: This question evaluates a candidate's ability to process semi-structured event data using grouping, deduplication, and timestamp arithmetic in Python. It falls under coding and algorithms, commonly asked to assess practical data-wrangling skills such as aggregating per-key counts and computing time-based metrics while handling edge cases like unordered records and single-event users.
Task 1 - Unique Users per Activity
Constraints
- 0 <= number of records (the input list may be empty).
- user_id is an integer.
- activity is a non-empty string.
- timestamp is formatted as 'YYYY-MM-DD HH:MM:SS'.
- Records are not guaranteed to be in chronological order (irrelevant for Task 1).
Examples
Input: ([{'user_id': 1, 'timestamp': '2024-07-26 10:00:00', 'activity': 'login'}, {'user_id': 2, 'timestamp': '2024-07-26 10:05:00', 'activity': 'login'}, {'user_id': 1, 'timestamp': '2024-07-26 10:10:00', 'activity': 'view'}, {'user_id': 1, 'timestamp': '2024-07-26 10:15:00', 'activity': 'purchase'}, {'user_id': 2, 'timestamp': '2024-07-26 10:20:00', 'activity': 'view'}, {'user_id': 3, 'timestamp': '2024-07-26 10:25:00', 'activity': 'login'}, {'user_id': 1, 'timestamp': '2024-07-26 10:30:00', 'activity': 'logout'}],)
Expected Output: {'login': 3, 'view': 2, 'purchase': 1, 'logout': 1}
Explanation: login by users {1,2,3}=3; view by {1,2}=2; purchase by {1}=1; logout by {1}=1.
Input: ([],)
Expected Output: {}
Explanation: Empty input list -> empty mapping.
Input: ([{'user_id': 5, 'timestamp': '2024-01-01 00:00:00', 'activity': 'view'}, {'user_id': 5, 'timestamp': '2024-01-01 00:01:00', 'activity': 'view'}, {'user_id': 6, 'timestamp': '2024-01-01 00:02:00', 'activity': 'view'}],)
Expected Output: {'view': 2}
Explanation: User 5 views twice but counts once; user 6 views once -> 2 distinct viewers.
Input: ([{'user_id': 9, 'timestamp': '2024-05-05 12:00:00', 'activity': 'login'}],)
Expected Output: {'login': 1}
Explanation: A single record: one activity with one distinct user.
Hints
- Map each activity to a set of user_ids so duplicates collapse automatically.
- A set makes the same (user_id, activity) pair count only once.
- Return len(set) for each activity; an empty input naturally yields {}.
Task 2 - Average Time Spent per User (seconds)
Constraints
- 0 <= number of records (the input list may be empty -> return 0.0).
- user_id is an integer.
- timestamp is formatted as 'YYYY-MM-DD HH:MM:SS' (24-hour, naive local time).
- Records are not guaranteed to be in chronological order.
- A user with exactly one action contributes 0 seconds but still counts in the denominator.
Examples
Input: ([{'user_id': 1, 'timestamp': '2024-07-26 10:00:00', 'activity': 'login'}, {'user_id': 2, 'timestamp': '2024-07-26 10:05:00', 'activity': 'login'}, {'user_id': 1, 'timestamp': '2024-07-26 10:10:00', 'activity': 'view'}, {'user_id': 1, 'timestamp': '2024-07-26 10:15:00', 'activity': 'purchase'}, {'user_id': 2, 'timestamp': '2024-07-26 10:20:00', 'activity': 'view'}, {'user_id': 3, 'timestamp': '2024-07-26 10:25:00', 'activity': 'login'}, {'user_id': 1, 'timestamp': '2024-07-26 10:30:00', 'activity': 'logout'}],)
Expected Output: 900.0
Explanation: User1=1800s (10:00->10:30), User2=900s (10:05->10:20), User3=0s -> (1800+900+0)/3=900.0.
Input: ([],)
Expected Output: 0.0
Explanation: No users -> return 0.0 (no division by zero).
Input: ([{'user_id': 8, 'timestamp': '2024-04-04 10:00:00', 'activity': 'login'}],)
Expected Output: 0.0
Explanation: Single action for one user -> 0s; average over 1 user = 0.0.
Input: ([{'user_id': 7, 'timestamp': '2024-03-03 09:00:00', 'activity': 'logout'}, {'user_id': 7, 'timestamp': '2024-03-03 08:00:00', 'activity': 'login'}],)
Expected Output: 3600.0
Explanation: Out-of-order: user 7 spans 08:00->09:00 = 3600s; one user -> average 3600.0.
Hints
- For each user track the earliest and latest timestamp (parse with datetime.strptime).
- Duration = (max - min).total_seconds(); a single action yields 0.
- Guard the empty case: with no users, return 0.0 instead of dividing by zero.