Design streaming new-vs-returning monthly metrics

Q: Design streaming new-vs-returning monthly metrics

This question evaluates a candidate's competency in designing scalable, stateful streaming analytics for monthly NEW vs RETURNING request metrics, focusing on event-time processing with late/out-of-order arrivals, deduplication, compact state and probabilistic data-structure trade-offs with quantifiable error bounds.

Q: How do I approach Coding & Algorithms interview questions?

Coding & Algorithms questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master coding & algorithms interviews.

Question

Streaming design: Monthly NEW vs RETURNING request shares (event-time, with late/out-of-order and duplicates)

Context

You receive a high-volume event stream of requests. Each event has at least: user_id, request_id (unique if available), event_time (convertible to a specified time zone). Events are mostly time-ordered but can arrive up to 7 days late. Duplicates may appear. You may keep limited state per user, with 8 GB RAM available per processing task. Up to 1B distinct users and 50K requests/sec overall.

Goal: For each calendar month in the specified time zone, emit at month close the counts and percentage shares of requests from NEW vs RETURNING users.

Definition: A request is NEW if its month equals that user's first-ever request month; otherwise RETURNING.

Tasks

Propose data structures (e.g., compact first-seen month store, Bloom/Cuckoo filters, HLL/Count-Min) and quantify memory footprints.
Provide both exact and approximate designs, with error bounds that ensure ≤ 0.5 percentage-point error in monthly shares.
Explain handling of late and out-of-order events and define monthly watermarking/finalization.
Describe a deduplication strategy.
Give time and space complexity.
Describe recovery/checkpointing for fault tolerance.

Design streaming new-vs-returning monthly metrics

Streaming design: Monthly NEW vs RETURNING request shares (event-time, with late/out-of-order and duplicates)

Context

Tasks

Solution

Comments (0)

Design streaming new-vs-returning monthly metrics

Overview

Streaming design: Monthly NEW vs RETURNING request shares (event-time, with late/out-of-order and duplicates)

Context

Tasks

Solution

Comments (0)