This question evaluates a candidate's ability to perform event-log sessionization and time-interval aggregation across users and spaces, testing skills in algorithms, state management, and event processing within the Coding & Algorithms domain.
You are given an event log of user activity in Twitter Spaces. Each record has:
operation
: one of
create
,
join
,
leave
space_id
: identifier of the space (string)
user_id
: identifier of the user (string)
timestamp
: Unix timestamp in seconds (integer)
A user is considered in a space from the time they join until they leave. The create operation means the user created the space and is also considered to have joined at that timestamp.
Return the total active time (in seconds) for each space, defined as:
sum over all users of (time that user spent in that space)
A user may join/leave the same space multiple times; all sessions should be summed.
Input records:
["create", "abc", "user_1", 1234567000]
["join", "abc", "user_2", 1234567100]
["leave", "abc", "user_2", 1234567300]
["create", "def", "user_2", 1234568000]
["leave", "def", "user_2", 1234568500]
["leave", "abc", "user_1", 1234569000]
Output:
{ "abc": 2200, "def": 500 }
Explanation: user_1 spent 2000 seconds in abc, user_2 spent 200 seconds in abc, and user_2 spent 500 seconds in def.
Design a data structure / approach that can continuously output the top-k spaces by current active user count (i.e., number of users currently in the space), with updates arriving as join/leave events stream in.