Design MapReduce for schedules

Q: Design MapReduce for schedules

This question evaluates proficiency in scalable stream and batch data processing, windowed aggregation, event-time semantics, deduplication, skew mitigation, and distributed computation using MapReduce/Spark paradigms; it falls under System Design and Big Data/stream-processing and is commonly asked to assess designing correct, high-throughput windowed aggregations and handling late or duplicated events at scale. The level of abstraction is practical application with architectural and operational considerations rather than low-level coding, and this English summary frames the competency and interview focus for search engines.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

Design a scalable job to compute rolling 15-minute meeting concurrency and room utilization

You are given a massive log of meeting events with fields:

user_id
start_time (UTC)
end_time (UTC)
room_id
office_id

Compute, for each rolling 15-minute window (assume slide = 1 minute unless specified otherwise):

Concurrent meeting counts per office (report peak concurrent meetings within each 15-minute window).
Rooms whose utilization exceeds 80% within the window, where utilization = occupied_time_in_window / 900 seconds.

Design a MapReduce or Spark job that:

Specifies map keys/values and how you transform intervals into windowed aggregates (e.g., line-sweep with +1/−1 at boundaries vs. interval splitting into windows).
Describes the shuffle/partition strategy to minimize skew and bound reducer work.
Explains handling of late and duplicated events (event-time watermarks, dedup keys, update/retraction strategy).
Proposes validation and correctness checks at scale.

Make minimal, explicit assumptions if needed (e.g., presence of meeting_id for dedup, window slide granularity).

Design MapReduce for schedules

Design a scalable job to compute rolling 15-minute meeting concurrency and room utilization

Solution

Comments (0)

Design MapReduce for schedules

Overview

Design a scalable job to compute rolling 15-minute meeting concurrency and room utilization

Solution

Comments (0)