Compute Rolling Averages and Merge Intervals
Company: WeRide
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: medium
Interview Round: Technical Screen
You are given two independent pandas tasks.
1. Symmetric sliding-window average
- Input dataframe `df` has columns:
- `row_id` INT: unique row order
- `value` FLOAT: numeric value
- Process rows in ascending `row_id`.
- Given an integer `k >= 0`, create a new column `window_avg` such that for row `i`:
`window_avg(i) = average(value[i-k], ..., value[i], ..., value[i+k])`
- Only compute the average if the row has at least `k` previous rows and `k` later rows.
- For the first `k` rows and the last `k` rows, set `window_avg = -1`.
- Return the original dataframe with the new column `window_avg`.
2. Merge autonomous-driving event intervals
- Input dataframe `events` has columns:
- `vehicle_id` STRING
- `start_ts` TIMESTAMP
- `end_ts` TIMESTAMP
- Each row represents a time interval for an event generated by an autonomous vehicle.
- For each `vehicle_id`, merge intervals that overlap or touch, meaning `next.start_ts <= current.end_ts`.
- Return one row per merged interval with columns:
- `vehicle_id`
- `merged_start_ts`
- `merged_end_ts`
- Assume timestamps are in the same timezone and that `start_ts <= end_ts` for every row.
Quick Answer: This question evaluates proficiency in data manipulation and temporal reasoning, specifically the competencies involved in sliding-window aggregation and merging overlapping or contiguous time intervals using pandas/SQL techniques.