Implement User Sessionization From Event Stream
Company: Discord
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: hard
Interview Round: Technical Screen
Quick Answer: This question evaluates proficiency in stream processing, stateful sessionization, time-based aggregation, and per-user metric computation, focusing on competencies such as tracking user activity across unbounded ordered event streams and computing top-channel aggregates.
Constraints
- 0 <= len(event_lines) <= 200000
- Each event line is valid JSON and `event_name` is always `send_message`
- Timestamps use UTC ISO-8601 format: `YYYY-MM-DDTHH:MM:SSZ`
- Input events are sorted by non-decreasing timestamp
- Do not emit sessions that are still active after the last provided event
Examples
Input: []
Expected Output: []
Explanation: No events means no emitted sessions.
Input: ['{"event_name":"send_message","timestamp":"2016-11-08T10:00:00Z","user_id":"1","channel_id":"7"}']
Expected Output: []
Explanation: A single event starts a session, but there is no later timestamp proving that session has ended.
Input: ['{"event_name":"send_message","timestamp":"2016-11-08T10:00:00Z","user_id":"1","channel_id":"2"}', '{"event_name":"send_message","timestamp":"2016-11-08T10:30:00Z","user_id":"1","channel_id":"1"}', '{"event_name":"send_message","timestamp":"2016-11-08T11:01:00Z","user_id":"2","channel_id":"9"}']
Expected Output: [{'user_id': '1', 'session_start_ts': '2016-11-08T10:00:00Z', 'session_end_ts': '2016-11-08T10:30:00Z', 'messages_sent': 2, 'top_channel_id': '1', 'top_channel_messages_sent': 1}]
Explanation: The two user 1 events are exactly 30 minutes apart, so they stay in the same session. When the 11:01 event arrives, user 1's last event is more than 30 minutes old, so that session is emitted. Channels 1 and 2 tie with one message each, so channel '1' wins by lexicographic tie-break.
Input: ['{"event_name":"send_message","timestamp":"2016-11-08T10:00:00Z","user_id":"1","channel_id":"2"}', '{"event_name":"send_message","timestamp":"2016-11-08T10:05:00Z","user_id":"2","channel_id":"1"}', '{"event_name":"send_message","timestamp":"2016-11-08T10:10:00Z","user_id":"1","channel_id":"3"}', '{"event_name":"send_message","timestamp":"2016-11-08T10:40:00Z","user_id":"2","channel_id":"1"}', '{"event_name":"send_message","timestamp":"2016-11-08T10:41:00Z","user_id":"3","channel_id":"9"}', '{"event_name":"send_message","timestamp":"2016-11-08T11:11:00Z","user_id":"3","channel_id":"9"}']
Expected Output: [{'user_id': '2', 'session_start_ts': '2016-11-08T10:05:00Z', 'session_end_ts': '2016-11-08T10:05:00Z', 'messages_sent': 1, 'top_channel_id': '1', 'top_channel_messages_sent': 1}, {'user_id': '1', 'session_start_ts': '2016-11-08T10:00:00Z', 'session_end_ts': '2016-11-08T10:10:00Z', 'messages_sent': 2, 'top_channel_id': '2', 'top_channel_messages_sent': 1}, {'user_id': '2', 'session_start_ts': '2016-11-08T10:40:00Z', 'session_end_ts': '2016-11-08T10:40:00Z', 'messages_sent': 1, 'top_channel_id': '1', 'top_channel_messages_sent': 1}]
Explanation: User 2's 10:05 session is emitted when 10:40 arrives. User 1's session is emitted when 10:41 arrives. User 2's 10:40 singleton session is emitted when 11:11 arrives. User 3 is still active at the end and must not be emitted.
Input: ['{"event_name":"send_message","timestamp":"2016-11-08T10:00:00Z","user_id":"1","channel_id":"5"}', '{"event_name":"send_message","timestamp":"2016-11-08T10:00:00Z","user_id":"2","channel_id":"4"}', '{"event_name":"send_message","timestamp":"2016-11-08T10:31:00Z","user_id":"3","channel_id":"1"}']
Expected Output: [{'user_id': '1', 'session_start_ts': '2016-11-08T10:00:00Z', 'session_end_ts': '2016-11-08T10:00:00Z', 'messages_sent': 1, 'top_channel_id': '5', 'top_channel_messages_sent': 1}, {'user_id': '2', 'session_start_ts': '2016-11-08T10:00:00Z', 'session_end_ts': '2016-11-08T10:00:00Z', 'messages_sent': 1, 'top_channel_id': '4', 'top_channel_messages_sent': 1}]
Explanation: Both user 1 and user 2 become finalizable when the 10:31 event arrives. Since they expire at the same time, emit them in lexicographically increasing `user_id` order.
Input: ['{"event_name":"send_message","timestamp":"2016-11-08T14:00:00Z","user_id":"1","channel_id":"1"}', '{"event_name":"send_message","timestamp":"2016-11-08T14:05:00Z","user_id":"1","channel_id":"2"}', '{"event_name":"send_message","timestamp":"2016-11-08T14:20:00Z","user_id":"1","channel_id":"2"}', '{"event_name":"send_message","timestamp":"2016-11-08T14:51:00Z","user_id":"2","channel_id":"9"}']
Expected Output: [{'user_id': '1', 'session_start_ts': '2016-11-08T14:00:00Z', 'session_end_ts': '2016-11-08T14:20:00Z', 'messages_sent': 3, 'top_channel_id': '2', 'top_channel_messages_sent': 2}]
Explanation: User 1's session becomes provably closed when the 14:51 event arrives. Channel '2' received two of the three messages, so it is the top channel.
Hints
- A session is only guaranteed to be closed when you see a later timestamp strictly greater than `last_event_time + 30 minutes`. Equality is not enough, because an event exactly 30 minutes later still belongs to the same session.
- Use a hash map to store each user's active session, and a min-heap of candidate expiry times so you can emit old sessions without scanning every active user on each event.