This question evaluates proficiency in time-series data manipulation and interval reasoning using pandas, including grouping, datetime handling, and interval unioning across partitions; it is commonly asked because merging temporal intervals tests practical data-cleaning abilities and handling of edge cases like overlaps, adjacency, sorting, and timezone-aware timestamps. It belongs to the Data Manipulation (SQL/Python) domain and represents a practical application that requires applied conceptual understanding of interval algebra and grouping operations rather than purely theoretical reasoning, assessing implementation-level skills in producing non-overlapping, sorted intervals per group.
You are given a pandas DataFrame df containing time intervals for multiple groups.
df columns:
group_id
(string/int): group identifier
start_ts
(datetime): interval start timestamp (timezone-aware or all in the same timezone)
end_ts
(datetime): interval end timestamp
Assumptions:
[start_ts, end_ts)
(so
end_ts == next start_ts
does
not
count as overlap unless you choose to treat adjacency as mergeable—state your choice).
start_ts < end_ts
for all rows.
Write pandas code to produce a new DataFrame containing merged (unioned) intervals within each group_id, such that:
Return a DataFrame with columns:
group_id
merged_start_ts
merged_end_ts
(Optionally, also output merged_duration_seconds per merged interval if requested by the interviewer.)