Aggregate user logs into 30-minute sessions

Q: Aggregate user logs into 30-minute sessions

This question evaluates data engineering competencies in time-series sessionization, datetime parsing and arithmetic, grouping and aggregation, distinct-count computation, and CSV log processing.

Q: How do I approach Coding & Algorithms interview questions?

Coding & Algorithms questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master coding & algorithms interviews.

Question

You are given a CSV file with columns:

user_id | log_datetime | topic

Example input rows (already in time order for the same user):

001 | 2025-03-01 00:01:00 | pricing
001 | 2025-03-01 00:02:00 | hotel
001 | 2025-03-01 00:03:00 | pricing
001 | 2025-03-01 01:30:00 | restaurant
001 | 2025-03-01 02:30:00 | restaurant

Task (Python):

For each user, sort events by log_datetime and split them into sessions: a new session starts when the gap from the previous event is more than 30 minutes .
For each session output:
- user_id
- session_start (timestamp of first event)
- session_end
  - if the session has 2+ events , use the timestamp of the last event
  - if the session has only 1 event , set session_end = session_start + 30 minutes (treating 30 minutes as the maximum assumed duration)
- count_topic : number of distinct topics in the session
- count_time : number of events (rows) in the session

Expected output for the example:

001 | 2025-03-01 00:01:00 | 2025-03-01 00:03:00 | 2 | 3
001 | 2025-03-01 01:30:00 | 2025-03-01 02:00:00 | 1 | 1
001 | 2025-03-01 02:30:00 | 2025-03-01 03:00:00 | 1 | 1

Implement a function/program that reads the CSV and produces these session aggregates.

Aggregate user logs into 30-minute sessions

Quick Overview

Comments (0)