This question evaluates a candidate's ability to design MapReduce-based distributed data processing for aggregating calendar availability, focusing on partitioning, time representation, handling data skew, and validating correctness and performance at scale.
You are given large-scale calendar data: each user has 0 or more busy intervals during the day. For a set of specified groups (each group is a set of user IDs), compute the common available time slots whose duration is at least d minutes. Assume all timestamps are normalized to UTC and intervals are half-open [start, end).
Input datasets:
Output:
Login required