Merge overlapping intervals per group in pandas
Company: Waymo
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: easy
Interview Round: Onsite
You are given a pandas DataFrame `df` containing time intervals for multiple groups.
### Input
`df` columns:
- `group_id` (string/int): group identifier
- `start_ts` (datetime): interval start timestamp (timezone-aware or all in the same timezone)
- `end_ts` (datetime): interval end timestamp
Assumptions:
- Each row represents a **half-open** interval `[start_ts, end_ts)` (so `end_ts == next start_ts` does **not** count as overlap unless you choose to treat adjacency as mergeable—state your choice).
- `start_ts < end_ts` for all rows.
- Intervals may be unsorted and may overlap within a group.
### Task
Write pandas code to produce a new DataFrame containing **merged (unioned) intervals within each `group_id`**, such that:
- Within each group, the output intervals are non-overlapping and sorted by time.
- Any overlapping (and, if you choose, adjacent) intervals are merged.
### Output
Return a DataFrame with columns:
- `group_id`
- `merged_start_ts`
- `merged_end_ts`
(Optionally, also output `merged_duration_seconds` per merged interval if requested by the interviewer.)
Quick Answer: This question evaluates proficiency in time-series data manipulation and interval reasoning using pandas, including grouping, datetime handling, and interval unioning across partitions; it is commonly asked because merging temporal intervals tests practical data-cleaning abilities and handling of edge cases like overlaps, adjacency, sorting, and timezone-aware timestamps. It belongs to the Data Manipulation (SQL/Python) domain and represents a practical application that requires applied conceptual understanding of interval algebra and grouping operations rather than purely theoretical reasoning, assessing implementation-level skills in producing non-overlapping, sorted intervals per group.