Design an ad-click aggregation and enrichment pipeline

Q: Design an ad-click aggregation and enrichment pipeline

This is a System Design interview question from Rippling for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

Scenario

You are designing a data platform to measure advertising performance.

Mobile apps and web browsers send ad impression and ad click events. Analysts need near-real-time dashboards and batch reports.

Requirements

Ingest impression/click events from mobile and web clients.
Produce aggregates such as:
- clicks / impressions / CTR
- grouped by time window (e.g., 1 min, 1 hour, 1 day)
- grouped by dimensions like campaign_id , ad_id , publisher_id , country , device_type
Enrich events by joining with other data sources (examples):
- campaign metadata (budget, objective)
- ad metadata (creative type)
- user/device attributes (coarse geo, OS)
Support both:
- near-real-time queries (seconds to a few minutes delay)
- historical queries over months
Event delivery constraints:
- clients may be offline and retry
- duplicate/out-of-order events can occur

Scale & SLOs (assume)

Peak 500k events/sec (impressions+clicks), average 100k/sec.
Dashboard freshness: P95 < 2 minutes.
Correctness: exactly-once is not required, but duplicates should be minimized and results should be explainable.

Key discussion prompt

Clients can send events:

one request per event, or
batch multiple events per request.

Explain the trade-offs between number of requests vs latency, especially for mobile networks.

Deliverables

High-level architecture and major components
Data model / schemas
How you do enrichment joins (stream-stream vs stream-table vs batch)
How you handle deduplication, late events, and backfills
What you store for serving (OLAP/warehouse) and for near-real-time dashboards