This question evaluates proficiency in scalable stream and batch data processing, windowed aggregation, event-time semantics, deduplication, skew mitigation, and distributed computation using MapReduce/Spark paradigms; it falls under System Design and Big Data/stream-processing and is commonly asked to assess designing correct, high-throughput windowed aggregations and handling late or duplicated events at scale. The level of abstraction is practical application with architectural and operational considerations rather than low-level coding, and this English summary frames the competency and interview focus for search engines.
You are given a massive log of meeting events with fields:
Compute, for each rolling 15-minute window (assume slide = 1 minute unless specified otherwise):
Design a MapReduce or Spark job that:
Make minimal, explicit assumptions if needed (e.g., presence of meeting_id for dedup, window slide granularity).
Login required