PracHub
QuestionsCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates a candidate's ability to process semi-structured event data using grouping, deduplication, and timestamp arithmetic in Python. It falls under coding and algorithms, commonly asked to assess practical data-wrangling skills such as aggregating per-key counts and computing time-based metrics while handling edge cases like unordered records and single-event users.

  • medium
  • Google
  • Coding & Algorithms
  • Data Engineer

User Activity Analytics: Unique Users per Activity and Average Time on Site

Company: Google

Role: Data Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

## User Activity Analytics You are given a collection of records, where each record captures a single user's action on a website. Each record is a dictionary with the following fields: - `user_id` (int): the id of the user who performed the action. - `timestamp` (str): the moment the action occurred, formatted as `'YYYY-MM-DD HH:MM:SS'` (24-hour clock, a single naive local time zone — no offsets). - `activity` (str): a short description of the action, e.g. `'login'`, `'view'`, `'purchase'`, `'logout'`. The input is a Python list of these record dictionaries: ```python user_activity = [ {'user_id': 1, 'timestamp': '2024-07-26 10:00:00', 'activity': 'login'}, {'user_id': 2, 'timestamp': '2024-07-26 10:05:00', 'activity': 'login'}, {'user_id': 1, 'timestamp': '2024-07-26 10:10:00', 'activity': 'view'}, {'user_id': 1, 'timestamp': '2024-07-26 10:15:00', 'activity': 'purchase'}, {'user_id': 2, 'timestamp': '2024-07-26 10:20:00', 'activity': 'view'}, {'user_id': 3, 'timestamp': '2024-07-26 10:25:00', 'activity': 'login'}, {'user_id': 1, 'timestamp': '2024-07-26 10:30:00', 'activity': 'logout'}, ] ``` Implement the following two computations. ### Task 1 — Unique users per activity Return a mapping where each key is an activity type and each value is the **count of distinct users** who performed that activity at least once. If the same user performs the same activity multiple times, that user is counted only once for that activity. For the example above, Task 1 must return: ```python {'login': 3, 'view': 2, 'purchase': 1, 'logout': 1} ``` ### Task 2 — Average time spent per user (in seconds) For each user, define **time spent on the website** as the number of seconds between that user's **earliest** recorded action and that user's **latest** recorded action (i.e. `last_timestamp - first_timestamp`, in seconds). A user with only a single recorded action has a time spent of `0` seconds. Return the **average** of these per-user durations across all distinct users, as a number of seconds (a float). For the example above: - User 1: from `10:00:00` to `10:30:00` -> 1800 seconds - User 2: from `10:05:00` to `10:20:00` -> 900 seconds - User 3: single action -> 0 seconds Average = `(1800 + 900 + 0) / 3 = 900.0` seconds, so Task 2 must return `900.0`. ### Requirements - Parse the `timestamp` strings yourself; do not assume the records arrive in chronological order. - Both computations must run over the same input list. - Suggested signatures: ```python def unique_users_per_activity(user_activity: list[dict]) -> dict: ... def average_time_per_user_seconds(user_activity: list[dict]) -> float: ... ``` ### Edge cases to handle - Records arriving out of timestamp order. - A user with exactly one action (duration of `0` seconds, but the user still counts toward Task 1 and toward the Task 2 denominator). - The same `(user_id, activity)` pair appearing multiple times (count the user once for that activity). - An empty input list: Task 1 returns an empty mapping `{}`; Task 2 returns `0.0` (no users to average over — avoid a division-by-zero error).

Quick Answer: This question evaluates a candidate's ability to process semi-structured event data using grouping, deduplication, and timestamp arithmetic in Python. It falls under coding and algorithms, commonly asked to assess practical data-wrangling skills such as aggregating per-key counts and computing time-based metrics while handling edge cases like unordered records and single-event users.

Task 1 - Unique Users per Activity

You are given `user_activity`, a list of records. Each record is a dict with `user_id` (int), `timestamp` (str, `'YYYY-MM-DD HH:MM:SS'`), and `activity` (str). Return a mapping where each key is an activity type and each value is the number of **distinct users** who performed that activity at least once. If the same user performs the same activity multiple times, count that user only once for that activity. For an empty input, return an empty mapping `{}`. Example: ```python unique_users_per_activity(user_activity) # {'login': 3, 'view': 2, 'purchase': 1, 'logout': 1} ```

Constraints

  • 0 <= number of records (the input list may be empty).
  • user_id is an integer.
  • activity is a non-empty string.
  • timestamp is formatted as 'YYYY-MM-DD HH:MM:SS'.
  • Records are not guaranteed to be in chronological order (irrelevant for Task 1).

Examples

Input: ([{'user_id': 1, 'timestamp': '2024-07-26 10:00:00', 'activity': 'login'}, {'user_id': 2, 'timestamp': '2024-07-26 10:05:00', 'activity': 'login'}, {'user_id': 1, 'timestamp': '2024-07-26 10:10:00', 'activity': 'view'}, {'user_id': 1, 'timestamp': '2024-07-26 10:15:00', 'activity': 'purchase'}, {'user_id': 2, 'timestamp': '2024-07-26 10:20:00', 'activity': 'view'}, {'user_id': 3, 'timestamp': '2024-07-26 10:25:00', 'activity': 'login'}, {'user_id': 1, 'timestamp': '2024-07-26 10:30:00', 'activity': 'logout'}],)

Expected Output: {'login': 3, 'view': 2, 'purchase': 1, 'logout': 1}

Explanation: login by users {1,2,3}=3; view by {1,2}=2; purchase by {1}=1; logout by {1}=1.

Input: ([],)

Expected Output: {}

Explanation: Empty input list -> empty mapping.

Input: ([{'user_id': 5, 'timestamp': '2024-01-01 00:00:00', 'activity': 'view'}, {'user_id': 5, 'timestamp': '2024-01-01 00:01:00', 'activity': 'view'}, {'user_id': 6, 'timestamp': '2024-01-01 00:02:00', 'activity': 'view'}],)

Expected Output: {'view': 2}

Explanation: User 5 views twice but counts once; user 6 views once -> 2 distinct viewers.

Input: ([{'user_id': 9, 'timestamp': '2024-05-05 12:00:00', 'activity': 'login'}],)

Expected Output: {'login': 1}

Explanation: A single record: one activity with one distinct user.

Hints

  1. Map each activity to a set of user_ids so duplicates collapse automatically.
  2. A set makes the same (user_id, activity) pair count only once.
  3. Return len(set) for each activity; an empty input naturally yields {}.

Task 2 - Average Time Spent per User (seconds)

Using the same `user_activity` list of records (`user_id`, `timestamp`, `activity`), define a user's **time spent** as the number of seconds between that user's earliest and latest recorded action (`last_timestamp - first_timestamp`). A user with a single action has a time spent of `0` seconds. Return the **average** of these per-user durations across all distinct users, as a float (seconds). Parse the timestamps yourself; records may arrive out of order. For an empty input, return `0.0` (avoid dividing by zero). Example: user 1 = 1800s, user 2 = 900s, user 3 = 0s -> average = (1800+900+0)/3 = 900.0.

Constraints

  • 0 <= number of records (the input list may be empty -> return 0.0).
  • user_id is an integer.
  • timestamp is formatted as 'YYYY-MM-DD HH:MM:SS' (24-hour, naive local time).
  • Records are not guaranteed to be in chronological order.
  • A user with exactly one action contributes 0 seconds but still counts in the denominator.

Examples

Input: ([{'user_id': 1, 'timestamp': '2024-07-26 10:00:00', 'activity': 'login'}, {'user_id': 2, 'timestamp': '2024-07-26 10:05:00', 'activity': 'login'}, {'user_id': 1, 'timestamp': '2024-07-26 10:10:00', 'activity': 'view'}, {'user_id': 1, 'timestamp': '2024-07-26 10:15:00', 'activity': 'purchase'}, {'user_id': 2, 'timestamp': '2024-07-26 10:20:00', 'activity': 'view'}, {'user_id': 3, 'timestamp': '2024-07-26 10:25:00', 'activity': 'login'}, {'user_id': 1, 'timestamp': '2024-07-26 10:30:00', 'activity': 'logout'}],)

Expected Output: 900.0

Explanation: User1=1800s (10:00->10:30), User2=900s (10:05->10:20), User3=0s -> (1800+900+0)/3=900.0.

Input: ([],)

Expected Output: 0.0

Explanation: No users -> return 0.0 (no division by zero).

Input: ([{'user_id': 8, 'timestamp': '2024-04-04 10:00:00', 'activity': 'login'}],)

Expected Output: 0.0

Explanation: Single action for one user -> 0s; average over 1 user = 0.0.

Input: ([{'user_id': 7, 'timestamp': '2024-03-03 09:00:00', 'activity': 'logout'}, {'user_id': 7, 'timestamp': '2024-03-03 08:00:00', 'activity': 'login'}],)

Expected Output: 3600.0

Explanation: Out-of-order: user 7 spans 08:00->09:00 = 3600s; one user -> average 3600.0.

Hints

  1. For each user track the earliest and latest timestamp (parse with datetime.strptime).
  2. Duration = (max - min).total_seconds(); a single action yields 0.
  3. Guard the empty case: with no users, return 0.0 instead of dividing by zero.
Last updated: Jul 1, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Find Common Free Time Slots Across Calendars - Google (easy)
  • Busiest Rental Car - Google (easy)
  • Deterministic Task Execution Order - Google (easy)
  • Count Clusters of 2D Points Within a Radius - Google (medium)
  • Infection Spread on a Grid (Cellular Automaton) - Google (hard)