How do I approach Data Manipulation (SQL/Python) interview questions?

Data Manipulation (SQL/Python) questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master data manipulation (sql/python) interviews.

What difficulty level is this interview question?

This is a medium difficulty Data Manipulation (SQL/Python) question, commonly asked during Technical Screen rounds at WeRide.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at WeRide during technical interviews.

Compute Rolling Averages and Merge Intervals

Q: Compute Rolling Averages and Merge Intervals

This question evaluates proficiency in data manipulation and temporal reasoning, specifically the competencies involved in sliding-window aggregation and merging overlapping or contiguous time intervals using pandas/SQL techniques.

You are given two independent pandas tasks.

Symmetric sliding-window average

Input dataframe df has columns:
- row_id INT: unique row order
- value FLOAT: numeric value
Process rows in ascending row_id .
Given an integer k >= 0 , create a new column window_avg such that for row i : window_avg(i) = average(value[i-k], ..., value[i], ..., value[i+k])
Only compute the average if the row has at least k previous rows and k later rows.
For the first k rows and the last k rows, set window_avg = -1 .
Return the original dataframe with the new column window_avg .

Merge autonomous-driving event intervals

Input dataframe events has columns:
- vehicle_id STRING
- start_ts TIMESTAMP
- end_ts TIMESTAMP
Each row represents a time interval for an event generated by an autonomous vehicle.
For each vehicle_id , merge intervals that overlap or touch, meaning next.start_ts <= current.end_ts .
Return one row per merged interval with columns:
- vehicle_id
- merged_start_ts
- merged_end_ts
Assume timestamps are in the same timezone and that start_ts <= end_ts for every row.

You are given two independent pandas tasks.

Symmetric sliding-window average

Input dataframe df has columns:
- row_id INT: unique row order
- value FLOAT: numeric value
Process rows in ascending row_id .
Given an integer k >= 0 , create a new column window_avg such that for row i : window_avg(i) = average(value[i-k], ..., value[i], ..., value[i+k])
Only compute the average if the row has at least k previous rows and k later rows.
For the first k rows and the last k rows, set window_avg = -1 .
Return the original dataframe with the new column window_avg .

Merge autonomous-driving event intervals

Input dataframe events has columns:
- vehicle_id STRING
- start_ts TIMESTAMP
- end_ts TIMESTAMP
Each row represents a time interval for an event generated by an autonomous vehicle.
For each vehicle_id , merge intervals that overlap or touch, meaning next.start_ts <= current.end_ts .
Return one row per merged interval with columns:
- vehicle_id
- merged_start_ts
- merged_end_ts
Assume timestamps are in the same timezone and that start_ts <= end_ts for every row.

Compute Rolling Averages and Merge Intervals

Quick Overview

Submit Your Answer

Compute Rolling Averages and Merge Intervals

Quick Overview

Submit Your Answer