Compute community ranges and town idle hours
Company: Scale AI
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
Quick Answer: This question evaluates skills in dataset joins, temporal interval aggregation and merging, gap computation, UTC time handling, and algorithmic complexity analysis alongside appropriate data-structure selection.
Part 1: Join party data and compute earliest/latest time range for each community
Constraints
- `0 <= n <= 200000`, where `n` is the number of parties
- `len(party_windows) == len(geo_data)`
- Each `party_id` appears exactly once in each input, and the ID sets are identical
- All times are valid ISO-8601 UTC datetimes, and `start_time < end_time` for every party
- Group only by the `community` string exactly as given
Examples
Input: ([('p1', '2024-01-01T10:00:00Z', '2024-01-01T12:00:00Z'), ('p2', '2024-01-01T09:00:00Z', '2024-01-01T11:30:00Z'), ('p3', '2024-01-02T15:00:00Z', '2024-01-02T18:00:00Z')], [('p1', 'CA', 'A', 'Town1', 'CommA'), ('p2', 'CA', 'A', 'Town2', 'CommA'), ('p3', 'CA', 'B', 'Town3', 'CommB')])
Expected Output: {'CommA': ('2024-01-01T09:00:00Z', '2024-01-01T12:00:00Z'), 'CommB': ('2024-01-02T15:00:00Z', '2024-01-02T18:00:00Z')}
Explanation: After joining on `party_id`, community `CommA` has two parties, so its range is from 09:00 to 12:00. `CommB` has one party.
Input: ([('p3', '2024-03-01T18:00:00Z', '2024-03-01T19:00:00Z'), ('p1', '2024-03-01T08:00:00Z', '2024-03-01T09:00:00Z'), ('p2', '2024-03-01T07:30:00+00:00', '2024-03-01T21:00:00+00:00')], [('p2', 'WA', 'King', 'Town2', 'North'), ('p1', 'WA', 'King', 'Town1', 'North'), ('p3', 'WA', 'Pierce', 'Town9', 'South')])
Expected Output: {'North': ('2024-03-01T07:30:00Z', '2024-03-01T21:00:00Z'), 'South': ('2024-03-01T18:00:00Z', '2024-03-01T19:00:00Z')}
Explanation: The input order is arbitrary. The `+00:00` timestamps are valid UTC and should be normalized to `Z` in the output.
Input: ([('solo', '2025-06-10T05:00:00Z', '2025-06-10T06:15:00Z')], [('solo', 'TX', 'Travis', 'Austin', 'Central')])
Expected Output: {'Central': ('2025-06-10T05:00:00Z', '2025-06-10T06:15:00Z')}
Explanation: Edge case: a single party means the earliest start and latest end are just that party's times.
Input: ([], [])
Expected Output: {}
Explanation: Edge case: no rows means no communities to report.
Solution
def solution(party_windows, geo_data):
from datetime import datetime, timezone
def parse_utc(s):
if s.endswith('Z'):
s = s[:-1] + '+00:00'
dt = datetime.fromisoformat(s)
if dt.tzinfo is None:
dt = dt.replace(tzinfo=timezone.utc)
else:
dt = dt.astimezone(timezone.utc)
return dt
def format_utc(dt):
return dt.astimezone(timezone.utc).isoformat().replace('+00:00', 'Z')
windows_by_id = {}
for party_id, start_time, end_time in party_windows:
windows_by_id[party_id] = (parse_utc(start_time), parse_utc(end_time))
community_ranges = {}
for party_id, _state, _county, _town, community in geo_data:
start_dt, end_dt = windows_by_id[party_id]
if community not in community_ranges:
community_ranges[community] = [start_dt, end_dt]
else:
if start_dt < community_ranges[community][0]:
community_ranges[community][0] = start_dt
if end_dt > community_ranges[community][1]:
community_ranges[community][1] = end_dt
return {
community: (format_utc(bounds[0]), format_utc(bounds[1]))
for community, bounds in community_ranges.items()
}
Time complexity: O(n), where n is the number of parties, assuming average O(1) hash map operations. Space complexity: O(n).
Hints
- Build a hash map from `party_id` to its `(start_time, end_time)` so the join is O(1) per row.
- For each community, keep only two values while scanning: the minimum start seen so far and the maximum end seen so far.
Part 2: Compute total idle hours in each town by merging party intervals
Constraints
- `0 <= n <= 200000`, where `n` is the number of parties
- `len(party_windows) == len(geo_data)`
- Each `party_id` appears exactly once in each input, and the ID sets are identical
- All times are valid ISO-8601 UTC datetimes, and `start_time < end_time` for every party
- Group only by the `town` string exactly as given
- Adjacent intervals (`end == next_start`) must be merged and must not contribute to idle time
Examples
Input: ([('p1', '2024-01-01T10:00:00Z', '2024-01-01T12:00:00Z'), ('p2', '2024-01-01T11:00:00Z', '2024-01-01T13:00:00Z'), ('p3', '2024-01-01T15:00:00Z', '2024-01-01T16:00:00Z'), ('p4', '2024-01-01T09:00:00Z', '2024-01-01T10:00:00Z'), ('p5', '2024-01-01T10:00:00Z', '2024-01-01T11:00:00Z')], [('p1', 'CA', 'A', 'TownX', 'C1'), ('p2', 'CA', 'A', 'TownX', 'C2'), ('p3', 'CA', 'A', 'TownX', 'C3'), ('p4', 'CA', 'B', 'TownY', 'C1'), ('p5', 'CA', 'B', 'TownY', 'C2')])
Expected Output: {'TownX': 2.0, 'TownY': 0.0}
Explanation: In `TownX`, `[10,12]` and `[11,13]` merge to `[10,13]`, leaving a 2-hour gap before `[15,16]`. In `TownY`, `[09,10]` and `[10,11]` are adjacent, so they merge and create no gap.
Input: ([('p1', '2024-02-01T08:00:00Z', '2024-02-01T09:00:00Z'), ('p2', '2024-02-01T09:00:00Z', '2024-02-01T10:00:00Z'), ('p3', '2024-02-01T10:30:00Z', '2024-02-01T12:00:00Z'), ('p4', '2024-02-01T11:00:00Z', '2024-02-01T13:00:00Z'), ('p5', '2024-02-01T14:00:00Z', '2024-02-01T14:30:00Z')], [('p1', 'OR', 'X', 'TownZ', 'A'), ('p2', 'OR', 'X', 'TownZ', 'B'), ('p3', 'OR', 'X', 'TownZ', 'C'), ('p4', 'OR', 'X', 'TownZ', 'D'), ('p5', 'OR', 'X', 'TownZ', 'E')])
Expected Output: {'TownZ': 1.5}
Explanation: Merged intervals are `[08:00,10:00]`, `[10:30,13:00]`, and `[14:00,14:30]`. The gaps are 0.5 hours and 1.0 hour, totaling 1.5 hours.
Input: ([('solo', '2025-06-10T05:00:00Z', '2025-06-10T06:15:00Z')], [('solo', 'TX', 'Travis', 'Austin', 'Central')])
Expected Output: {'Austin': 0.0}
Explanation: Edge case: a town with only one interval has no internal gaps.
Input: ([], [])
Expected Output: {}
Explanation: Edge case: no rows means there are no towns to report.
Solution
def solution(party_windows, geo_data):
from collections import defaultdict
from datetime import datetime, timezone
def parse_utc(s):
if s.endswith('Z'):
s = s[:-1] + '+00:00'
dt = datetime.fromisoformat(s)
if dt.tzinfo is None:
dt = dt.replace(tzinfo=timezone.utc)
else:
dt = dt.astimezone(timezone.utc)
return dt
windows_by_id = {}
for party_id, start_time, end_time in party_windows:
windows_by_id[party_id] = (parse_utc(start_time), parse_utc(end_time))
intervals_by_town = defaultdict(list)
for party_id, _state, _county, town, _community in geo_data:
start_dt, end_dt = windows_by_id[party_id]
intervals_by_town[town].append((start_dt, end_dt))
idle_hours = {}
for town, intervals in intervals_by_town.items():
intervals.sort(key=lambda interval: interval[0])
current_end = intervals[0][1]
idle_seconds = 0.0
for start_dt, end_dt in intervals[1:]:
if start_dt <= current_end:
if end_dt > current_end:
current_end = end_dt
else:
idle_seconds += (start_dt - current_end).total_seconds()
current_end = end_dt
idle_hours[town] = idle_seconds / 3600.0
return idle_hours
Time complexity: O(n log n) in the worst case because intervals for each town must be sorted; across all towns the total sorting cost is O(n log n). Space complexity: O(n).
Hints
- After joining by `party_id`, collect all `(start, end)` intervals for each town in a hash map.
- Sort each town's intervals by start time, then scan once: merge when `next_start <= current_end`; otherwise add the gap `next_start - current_end`.