Problem
You have an autonomous-driving clip dataset where each row represents a time interval from a driving run.
Table: clips
| column | type | description |
|---|
set_id | INT | Session/clip set identifier (e.g., 1–6) |
run_id | STRING | Driving run identifier, formatted like YYYYMMDD_HHMMSS |
start_time | INT | Clip start time offset (e.g., seconds) |
end_time | INT | Clip end time offset (e.g., seconds) |
set_number | INT | Dataset/source identifier |
Assume each interval is half-open: [start_time, end_time) so duration is end_time - start_time.
Intervals for the same run_id may:
-
overlap (partially or fully),
-
be duplicated/redundant,
-
have gaps (non-contiguous).
Task
Compute, for each run_id, the cumulative deduplicated time: the total duration covered by the union of all intervals for that run_id (i.e., after merging overlaps).
Example input
For run 20250101_122334:
-
set 1:
[58, 70)
-
set 2:
[57, 69)
-
set 3:
[55, 72)
-
set 4:
[56, 71)
-
set 5:
[80, 100)
For run 20250102_101010:
Expected output
Return a result with:
-
run_id
-
cumulative_unique_time
(same unit as the time offsets)
Using the example above, results would be:
-
20250101_122334
→
39
-
20250102_101010
→
19
Requirements
-
Write a
SQL
solution to compute
cumulative_unique_time
per
run_id
.
-
Write a
Python
solution to compute the same metric (given a list/dataframe of intervals).