Design an ads event reporting system that collects user-ad interaction events and serves aggregated metrics.
Requirements
-
Ingest events
from multiple platforms (web/mobile/server):
-
impression
,
click
,
conversion
(extendable to more event types)
-
Event schema
includes at minimum:
-
event_id
(unique),
timestamp
(event time),
ingest_time
,
user_id
(or anonymized id),
ad_id
,
campaign_id
,
event_type
, plus optional attributes (geo, device, app version).
-
Reporting queries
:
-
Time-series aggregates (e.g., per minute/hour/day)
-
Group-by dimensions: campaign/ad, geo, device
-
User cohort / segment partitioning
(e.g., users grouped by country, or an offline-defined cohort id)
-
Correctness and reliability
:
-
Handle duplicates (retries), out-of-order and late events
-
Support backfills and reprocessing
-
Latency goals
(assume):
-
Near-real-time dashboards (e.g., p95 < 1–5 minutes)
-
Accurate daily finalized reports
-
Scale goals
(assume):
-
50k–500k events/sec peak
-
Store raw events for 30–90 days; aggregates longer
Deliverables
Propose an end-to-end architecture, storage layout/partitioning, aggregation strategy, APIs for querying, and how you ensure deduplication + correctness with late events. Include key tradeoffs.