Compute window averages and merge intervals
Company: WeRide
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: medium
Interview Round: Technical Screen
You are given two independent pandas tasks.
1. **Sliding-window average**
- Input DataFrame: `df`
- Schema:
- `row_id` INT — unique row order key, already sorted ascending
- `value` FLOAT
- Given an integer `k >= 0`, compute for each row the average of the values from the previous `k` rows, the current row, and the next `k` rows.
- If a row does not have at least `k` previous rows and `k` next rows, set the output to `-1` for that row.
- Return a DataFrame with columns: `row_id`, `value`, `window_avg`.
2. **Merge overlapping autonomous-driving intervals**
- Input DataFrame: `segments`
- Schema:
- `vehicle_id` STRING
- `event_type` STRING
- `start_ts` TIMESTAMP
- `end_ts` TIMESTAMP
- Assume all timestamps are in the same timezone and `start_ts <= end_ts` for every row.
- For each `(vehicle_id, event_type)` independently, merge intervals that overlap or touch, where a new interval should be merged into the current one if `next.start_ts <= current.end_ts`.
- Return the merged result with columns: `vehicle_id`, `event_type`, `merged_start_ts`, `merged_end_ts`, sorted by `vehicle_id`, `event_type`, `merged_start_ts`.
Write pandas code for both tasks.
Quick Answer: This question evaluates proficiency in pandas-based data manipulation—specifically windowed aggregations with strict boundary conditions and group-wise temporal interval merging—demonstrating skills in rolling/window operations, grouping, sorting, and handling time-based overlaps.