PracHub
QuestionsPremiumLearningGuidesInterview PrepCoaches

Quick Overview

This question evaluates skills in dataset joins, temporal interval aggregation and merging, gap computation, UTC time handling, and algorithmic complexity analysis alongside appropriate data-structure selection.

  • Medium
  • Scale AI
  • Coding & Algorithms
  • Software Engineer

Compute community ranges and town idle hours

Company: Scale AI

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

You are given two in-memory datasets for a party-planning app. 1) PartyWindow: {partyId: string, startTime: ISO-8601 UTC datetime, endTime: ISO-8601 UTC datetime} with startTime < endTime. 2) GeoData: {partyId: string, state: string, county: string, town: string, community: string}. Each partyId appears exactly once in each dataset. Implement: (a) A method that joins PartyWindow with GeoData on partyId and returns, for every community, the earliest party start and the latest party end across all parties in that community. Specify your return type (e.g., Map<community, [minStart, maxEnd]>) and how you handle sorting/output order. (b) A method that, using the same inputs, computes for every town the total number of hours during which no party is happening anywhere in that town. First, within each town, merge overlapping or adjacent intervals (adjacent means end == nextStart and should not count as a gap). Then sum the durations of gaps strictly between the merged intervals over the span from the earliest start to the latest end observed in that town. Return Map<town, hours>, using UTC and ignoring DST, and state any rounding policy. State and justify the time and space complexity of your solutions, and discuss data structures you would use (e.g., hash maps, sorting, interval merging).

Quick Answer: This question evaluates skills in dataset joins, temporal interval aggregation and merging, gap computation, UTC time handling, and algorithmic complexity analysis alongside appropriate data-structure selection.

Part 1: Join party data and compute earliest/latest time range for each community

You are given two in-memory datasets for a party-planning app. - `party_windows`: each row is `(party_id, start_time, end_time)` - `geo_data`: each row is `(party_id, state, county, town, community)` Each `party_id` appears exactly once in both datasets. Join the datasets on `party_id`, then compute, for every `community`, the earliest party start time and the latest party end time among all parties in that community. Return a dictionary of the form `{community: (min_start, max_end)}`. All timestamps are ISO-8601 UTC datetimes. Normalize returned timestamps to UTC strings ending in `Z`. Output order is not important.

Constraints

  • `0 <= n <= 200000`, where `n` is the number of parties
  • `len(party_windows) == len(geo_data)`
  • Each `party_id` appears exactly once in each input, and the ID sets are identical
  • All times are valid ISO-8601 UTC datetimes, and `start_time < end_time` for every party
  • Group only by the `community` string exactly as given

Examples

Input: ([('p1', '2024-01-01T10:00:00Z', '2024-01-01T12:00:00Z'), ('p2', '2024-01-01T09:00:00Z', '2024-01-01T11:30:00Z'), ('p3', '2024-01-02T15:00:00Z', '2024-01-02T18:00:00Z')], [('p1', 'CA', 'A', 'Town1', 'CommA'), ('p2', 'CA', 'A', 'Town2', 'CommA'), ('p3', 'CA', 'B', 'Town3', 'CommB')])

Expected Output: {'CommA': ('2024-01-01T09:00:00Z', '2024-01-01T12:00:00Z'), 'CommB': ('2024-01-02T15:00:00Z', '2024-01-02T18:00:00Z')}

Explanation: After joining on `party_id`, community `CommA` has two parties, so its range is from 09:00 to 12:00. `CommB` has one party.

Input: ([('p3', '2024-03-01T18:00:00Z', '2024-03-01T19:00:00Z'), ('p1', '2024-03-01T08:00:00Z', '2024-03-01T09:00:00Z'), ('p2', '2024-03-01T07:30:00+00:00', '2024-03-01T21:00:00+00:00')], [('p2', 'WA', 'King', 'Town2', 'North'), ('p1', 'WA', 'King', 'Town1', 'North'), ('p3', 'WA', 'Pierce', 'Town9', 'South')])

Expected Output: {'North': ('2024-03-01T07:30:00Z', '2024-03-01T21:00:00Z'), 'South': ('2024-03-01T18:00:00Z', '2024-03-01T19:00:00Z')}

Explanation: The input order is arbitrary. The `+00:00` timestamps are valid UTC and should be normalized to `Z` in the output.

Input: ([('solo', '2025-06-10T05:00:00Z', '2025-06-10T06:15:00Z')], [('solo', 'TX', 'Travis', 'Austin', 'Central')])

Expected Output: {'Central': ('2025-06-10T05:00:00Z', '2025-06-10T06:15:00Z')}

Explanation: Edge case: a single party means the earliest start and latest end are just that party's times.

Input: ([], [])

Expected Output: {}

Explanation: Edge case: no rows means no communities to report.

Solution

def solution(party_windows, geo_data):
    from datetime import datetime, timezone

    def parse_utc(s):
        if s.endswith('Z'):
            s = s[:-1] + '+00:00'
        dt = datetime.fromisoformat(s)
        if dt.tzinfo is None:
            dt = dt.replace(tzinfo=timezone.utc)
        else:
            dt = dt.astimezone(timezone.utc)
        return dt

    def format_utc(dt):
        return dt.astimezone(timezone.utc).isoformat().replace('+00:00', 'Z')

    windows_by_id = {}
    for party_id, start_time, end_time in party_windows:
        windows_by_id[party_id] = (parse_utc(start_time), parse_utc(end_time))

    community_ranges = {}
    for party_id, _state, _county, _town, community in geo_data:
        start_dt, end_dt = windows_by_id[party_id]
        if community not in community_ranges:
            community_ranges[community] = [start_dt, end_dt]
        else:
            if start_dt < community_ranges[community][0]:
                community_ranges[community][0] = start_dt
            if end_dt > community_ranges[community][1]:
                community_ranges[community][1] = end_dt

    return {
        community: (format_utc(bounds[0]), format_utc(bounds[1]))
        for community, bounds in community_ranges.items()
    }

Time complexity: O(n), where n is the number of parties, assuming average O(1) hash map operations. Space complexity: O(n).

Hints

  1. Build a hash map from `party_id` to its `(start_time, end_time)` so the join is O(1) per row.
  2. For each community, keep only two values while scanning: the minimum start seen so far and the maximum end seen so far.

Part 2: Compute total idle hours in each town by merging party intervals

You are given the same two datasets: - `party_windows`: each row is `(party_id, start_time, end_time)` - `geo_data`: each row is `(party_id, state, county, town, community)` Each `party_id` appears exactly once in both datasets. Join the datasets on `party_id`, then group party intervals by `town`. For each town: 1. Sort that town's intervals by start time. 2. Merge intervals that overlap or are adjacent. Adjacent means `current_end == next_start`, and such intervals should be treated as continuous with no gap. 3. Over the span from the earliest party start to the latest party end in that town, sum the durations of the gaps strictly between the merged intervals. Return a dictionary `{town: idle_hours}`. Use UTC only. Ignore DST. Return hours as `total_seconds / 3600.0` with no additional rounding. Output order is not important.

Constraints

  • `0 <= n <= 200000`, where `n` is the number of parties
  • `len(party_windows) == len(geo_data)`
  • Each `party_id` appears exactly once in each input, and the ID sets are identical
  • All times are valid ISO-8601 UTC datetimes, and `start_time < end_time` for every party
  • Group only by the `town` string exactly as given
  • Adjacent intervals (`end == next_start`) must be merged and must not contribute to idle time

Examples

Input: ([('p1', '2024-01-01T10:00:00Z', '2024-01-01T12:00:00Z'), ('p2', '2024-01-01T11:00:00Z', '2024-01-01T13:00:00Z'), ('p3', '2024-01-01T15:00:00Z', '2024-01-01T16:00:00Z'), ('p4', '2024-01-01T09:00:00Z', '2024-01-01T10:00:00Z'), ('p5', '2024-01-01T10:00:00Z', '2024-01-01T11:00:00Z')], [('p1', 'CA', 'A', 'TownX', 'C1'), ('p2', 'CA', 'A', 'TownX', 'C2'), ('p3', 'CA', 'A', 'TownX', 'C3'), ('p4', 'CA', 'B', 'TownY', 'C1'), ('p5', 'CA', 'B', 'TownY', 'C2')])

Expected Output: {'TownX': 2.0, 'TownY': 0.0}

Explanation: In `TownX`, `[10,12]` and `[11,13]` merge to `[10,13]`, leaving a 2-hour gap before `[15,16]`. In `TownY`, `[09,10]` and `[10,11]` are adjacent, so they merge and create no gap.

Input: ([('p1', '2024-02-01T08:00:00Z', '2024-02-01T09:00:00Z'), ('p2', '2024-02-01T09:00:00Z', '2024-02-01T10:00:00Z'), ('p3', '2024-02-01T10:30:00Z', '2024-02-01T12:00:00Z'), ('p4', '2024-02-01T11:00:00Z', '2024-02-01T13:00:00Z'), ('p5', '2024-02-01T14:00:00Z', '2024-02-01T14:30:00Z')], [('p1', 'OR', 'X', 'TownZ', 'A'), ('p2', 'OR', 'X', 'TownZ', 'B'), ('p3', 'OR', 'X', 'TownZ', 'C'), ('p4', 'OR', 'X', 'TownZ', 'D'), ('p5', 'OR', 'X', 'TownZ', 'E')])

Expected Output: {'TownZ': 1.5}

Explanation: Merged intervals are `[08:00,10:00]`, `[10:30,13:00]`, and `[14:00,14:30]`. The gaps are 0.5 hours and 1.0 hour, totaling 1.5 hours.

Input: ([('solo', '2025-06-10T05:00:00Z', '2025-06-10T06:15:00Z')], [('solo', 'TX', 'Travis', 'Austin', 'Central')])

Expected Output: {'Austin': 0.0}

Explanation: Edge case: a town with only one interval has no internal gaps.

Input: ([], [])

Expected Output: {}

Explanation: Edge case: no rows means there are no towns to report.

Solution

def solution(party_windows, geo_data):
    from collections import defaultdict
    from datetime import datetime, timezone

    def parse_utc(s):
        if s.endswith('Z'):
            s = s[:-1] + '+00:00'
        dt = datetime.fromisoformat(s)
        if dt.tzinfo is None:
            dt = dt.replace(tzinfo=timezone.utc)
        else:
            dt = dt.astimezone(timezone.utc)
        return dt

    windows_by_id = {}
    for party_id, start_time, end_time in party_windows:
        windows_by_id[party_id] = (parse_utc(start_time), parse_utc(end_time))

    intervals_by_town = defaultdict(list)
    for party_id, _state, _county, town, _community in geo_data:
        start_dt, end_dt = windows_by_id[party_id]
        intervals_by_town[town].append((start_dt, end_dt))

    idle_hours = {}
    for town, intervals in intervals_by_town.items():
        intervals.sort(key=lambda interval: interval[0])

        current_end = intervals[0][1]
        idle_seconds = 0.0

        for start_dt, end_dt in intervals[1:]:
            if start_dt <= current_end:
                if end_dt > current_end:
                    current_end = end_dt
            else:
                idle_seconds += (start_dt - current_end).total_seconds()
                current_end = end_dt

        idle_hours[town] = idle_seconds / 3600.0

    return idle_hours

Time complexity: O(n log n) in the worst case because intervals for each town must be sorted; across all towns the total sorting cost is O(n log n). Space complexity: O(n).

Hints

  1. After joining by `party_id`, collect all `(start, end)` intervals for each town in a hash map.
  2. Sort each town's intervals by start time, then scan once: merge when `next_start <= current_end`; otherwise add the gap `next_start - current_end`.
Last updated: May 20, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Implement a Dependency-Aware Task Scheduler - Scale AI (easy)
  • Schedule Ready Tasks by Deadline - Scale AI (medium)
  • Implement a Task Processor - Scale AI (medium)
  • Update a Neuron Grid - Scale AI (medium)
  • Implement multi-head attention and LLM sampling - Scale AI (easy)