PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches

Quick Overview

This question evaluates data engineering competencies in time-series sessionization, datetime parsing and arithmetic, grouping and aggregation, distinct-count computation, and CSV log processing.

  • hard
  • Robinhood
  • Coding & Algorithms
  • Data Engineer

Aggregate user logs into 30-minute sessions

Company: Robinhood

Role: Data Engineer

Category: Coding & Algorithms

Difficulty: hard

Interview Round: Technical Screen

You are given a CSV file with columns: `user_id | log_datetime | topic` Example input rows (already in time order for the same user): - `001 | 2025-03-01 00:01:00 | pricing` - `001 | 2025-03-01 00:02:00 | hotel` - `001 | 2025-03-01 00:03:00 | pricing` - `001 | 2025-03-01 01:30:00 | restaurant` - `001 | 2025-03-01 02:30:00 | restaurant` Task (Python): 1) For each user, sort events by `log_datetime` and split them into sessions: a new session starts when the gap from the previous event is **more than 30 minutes**. 2) For each session output: - `user_id` - `session_start` (timestamp of first event) - `session_end` - if the session has **2+ events**, use the timestamp of the last event - if the session has **only 1 event**, set `session_end = session_start + 30 minutes` (treating 30 minutes as the maximum assumed duration) - `count_topic`: number of **distinct** topics in the session - `count_time`: number of events (rows) in the session Expected output for the example: - `001 | 2025-03-01 00:01:00 | 2025-03-01 00:03:00 | 2 | 3` - `001 | 2025-03-01 01:30:00 | 2025-03-01 02:00:00 | 1 | 1` - `001 | 2025-03-01 02:30:00 | 2025-03-01 03:00:00 | 1 | 1` Implement a function/program that reads the CSV and produces these session aggregates.

Quick Answer: This question evaluates data engineering competencies in time-series sessionization, datetime parsing and arithmetic, grouping and aggregation, distinct-count computation, and CSV log processing.

You are given CSV text containing user activity logs with columns `user_id`, `log_datetime`, and `topic`. For each user, sort the events by `log_datetime` and split them into sessions. A new session starts when the gap from the previous event is more than 30 minutes. For each session, return: - `user_id` - `session_start`: timestamp of the first event in the session - `session_end`: if the session has 2 or more events, use the timestamp of the last event; if the session has only 1 event, use `session_start + 30 minutes` - `count_topic`: number of distinct topics in the session - `count_time`: number of events in the session Return all session aggregates ordered by `user_id` ascending, then by `session_start` ascending.

Constraints

  • 0 <= number of log rows <= 200000
  • Each row has the format `user_id,log_datetime,topic`
  • Timestamps are valid and use the format `YYYY-MM-DD HH:MM:SS`
  • A gap of exactly 30 minutes stays in the same session; only gaps greater than 30 minutes start a new session

Examples

Input: "user_id,log_datetime,topic\n001,2025-03-01 00:01:00,pricing\n001,2025-03-01 00:02:00,hotel\n001,2025-03-01 00:03:00,pricing\n001,2025-03-01 01:30:00,restaurant\n001,2025-03-01 02:30:00,restaurant\n"

Expected Output: [["001", "2025-03-01 00:01:00", "2025-03-01 00:03:00", 2, 3], ["001", "2025-03-01 01:30:00", "2025-03-01 02:00:00", 1, 1], ["001", "2025-03-01 02:30:00", "2025-03-01 03:00:00", 1, 1]]

Explanation: The first three rows are within 30 minutes of each other, so they form one session. The last two rows are each more than 30 minutes apart from the previous row, so each becomes a single-event session with a 30-minute assumed duration.

Input: "user_id,log_datetime,topic\n002,2025-03-01 10:31:00,sports\n001,2025-03-01 09:30:00,a\n001,2025-03-01 10:00:00,b\n002,2025-03-01 10:00:00,news\n001,2025-03-01 10:31:00,a\n002,2025-03-01 10:30:00,news\n001,2025-03-01 09:00:00,a\n"

Expected Output: [["001", "2025-03-01 09:00:00", "2025-03-01 10:00:00", 2, 3], ["001", "2025-03-01 10:31:00", "2025-03-01 11:01:00", 1, 1], ["002", "2025-03-01 10:00:00", "2025-03-01 10:31:00", 2, 3]]

Explanation: Rows are not initially ordered, so each user's events must be sorted first. For user 001, gaps of exactly 30 minutes stay in the same session, but a 31-minute gap starts a new session. For user 002, all three events belong to one session.

Input: "user_id,log_datetime,topic\n007,2025-07-04 12:00:00,travel\n"

Expected Output: [["007", "2025-07-04 12:00:00", "2025-07-04 12:30:00", 1, 1]]

Explanation: A single event forms a one-row session, so the session end is 30 minutes after the start.

Input: "user_id,log_datetime,topic\n"

Expected Output: []

Explanation: There are no data rows, so there are no sessions.

Hints

  1. Group rows by user first, then sort each user's events by timestamp before building sessions.
  2. When scanning a user's events, keep track of the current session start time, last event time, event count, and a set of distinct topics.
Last updated: May 4, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Build a Weekly Calendar - Robinhood (medium)
  • Solve path and inventory problems - Robinhood
  • Implement Calendar Layout and String Packing - Robinhood (medium)
  • Count Referral Descendants - Robinhood (medium)
  • Compute dependency load factors in a DAG - Robinhood (medium)