PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/System Design/Rippling

Design an ad-click aggregation and enrichment pipeline

Last updated: Mar 29, 2026

Quick Overview

This question evaluates system design and data engineering competencies, focusing on large-scale streaming ingestion, aggregation, enrichment joins, deduplication, and trade-offs between request batching and latency for mobile networks.

  • medium
  • Rippling
  • System Design
  • Software Engineer

Design an ad-click aggregation and enrichment pipeline

Company: Rippling

Role: Software Engineer

Category: System Design

Difficulty: medium

Interview Round: Onsite

## Scenario You are designing a data platform to measure advertising performance. Mobile apps and web browsers send **ad impression** and **ad click** events. Analysts need near-real-time dashboards and batch reports. ## Requirements - Ingest impression/click events from **mobile and web** clients. - Produce aggregates such as: - clicks / impressions / CTR - grouped by time window (e.g., 1 min, 1 hour, 1 day) - grouped by dimensions like `campaign_id`, `ad_id`, `publisher_id`, `country`, `device_type` - **Enrich** events by joining with other data sources (examples): - campaign metadata (budget, objective) - ad metadata (creative type) - user/device attributes (coarse geo, OS) - Support both: - **near-real-time** queries (seconds to a few minutes delay) - **historical** queries over months - Event delivery constraints: - clients may be offline and retry - duplicate/out-of-order events can occur ## Scale & SLOs (assume) - Peak 500k events/sec (impressions+clicks), average 100k/sec. - Dashboard freshness: P95 < 2 minutes. - Correctness: exactly-once is not required, but **duplicates should be minimized** and results should be explainable. ## Key discussion prompt Clients can send events: 1) one request per event, or 2) batch multiple events per request. Explain the **trade-offs between number of requests vs latency**, especially for mobile networks. ## Deliverables - High-level architecture and major components - Data model / schemas - How you do enrichment joins (stream-stream vs stream-table vs batch) - How you handle deduplication, late events, and backfills - What you store for serving (OLAP/warehouse) and for near-real-time dashboards

Quick Answer: This question evaluates system design and data engineering competencies, focusing on large-scale streaming ingestion, aggregation, enrichment joins, deduplication, and trade-offs between request batching and latency for mobile networks.

Related Interview Questions

  • Design a personalized news aggregator - Rippling (medium)
  • Design a Scalable News Feed - Rippling (medium)
  • Design Scalable Expense Violation Processing - Rippling (hard)
  • Design a news aggregator like Google News - Rippling (medium)
  • Design several large-scale systems - Rippling (hard)
Rippling logo
Rippling
Oct 17, 2025, 12:00 AM
Software Engineer
Onsite
System Design
12
0

Scenario

You are designing a data platform to measure advertising performance.

Mobile apps and web browsers send ad impression and ad click events. Analysts need near-real-time dashboards and batch reports.

Requirements

  • Ingest impression/click events from mobile and web clients.
  • Produce aggregates such as:
    • clicks / impressions / CTR
    • grouped by time window (e.g., 1 min, 1 hour, 1 day)
    • grouped by dimensions like campaign_id , ad_id , publisher_id , country , device_type
  • Enrich events by joining with other data sources (examples):
    • campaign metadata (budget, objective)
    • ad metadata (creative type)
    • user/device attributes (coarse geo, OS)
  • Support both:
    • near-real-time queries (seconds to a few minutes delay)
    • historical queries over months
  • Event delivery constraints:
    • clients may be offline and retry
    • duplicate/out-of-order events can occur

Scale & SLOs (assume)

  • Peak 500k events/sec (impressions+clicks), average 100k/sec.
  • Dashboard freshness: P95 < 2 minutes.
  • Correctness: exactly-once is not required, but duplicates should be minimized and results should be explainable.

Key discussion prompt

Clients can send events:

  1. one request per event, or
  2. batch multiple events per request.

Explain the trade-offs between number of requests vs latency, especially for mobile networks.

Deliverables

  • High-level architecture and major components
  • Data model / schemas
  • How you do enrichment joins (stream-stream vs stream-table vs batch)
  • How you handle deduplication, late events, and backfills
  • What you store for serving (OLAP/warehouse) and for near-real-time dashboards

Solution

Show

Submit Your Answer

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Rippling•More Software Engineer•Rippling Software Engineer•Rippling System Design•Software Engineer System Design
PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.