Design MapReduce for schedule aggregation

Q: Design MapReduce for schedule aggregation

This question evaluates a candidate's ability to design MapReduce-based distributed data processing for aggregating calendar availability, focusing on partitioning, time representation, handling data skew, and validating correctness and performance at scale.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

MapReduce Design: Common Availability From Busy Intervals

Context

You are given large-scale calendar data: each user has 0 or more busy intervals during the day. For a set of specified groups (each group is a set of user IDs), compute the common available time slots whose duration is at least d minutes. Assume all timestamps are normalized to UTC and intervals are half-open [start, end).

Input datasets:

Busy intervals: records (user_id, start_ts, end_ts)
Group membership: records (group_id, user_id)
Query parameter: minimum duration d (minutes)

Output:

For each (group_id, calendar_day), the list of common free intervals of length ≥ d.

Requirements

Define the Map outputs (keys/values), partitioning, and Reduce logic.
Explain how you discretize time (or avoid discretization) and the trade-offs.
Describe how you mitigate data skew (e.g., very large groups, rush-hour hotspots).
Explain how you validate correctness and performance at scale.

Design MapReduce for schedule aggregation

MapReduce Design: Common Availability From Busy Intervals

Context

Requirements

Solution

Comments (0)

Design MapReduce for schedule aggregation

Overview

MapReduce Design: Common Availability From Busy Intervals

Context

Requirements

Solution

Comments (0)