Design ad clickstream analytics system

Q: Design ad clickstream analytics system

This question evaluates a candidate's understanding of distributed systems and data engineering concepts including streaming ingestion (Kafka), object storage and data lake layout (S3), query-engine integration (Presto), scalability, fault-tolerance, schema evolution, deduplication, and analytics requirements for high-volume ad clickstream data.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

System Design: Ad Clickstream Ingestion and Analytics with Kafka, S3, and Presto

Context

You are asked to design an end-to-end advertisement clickstream platform that ingests events from web/mobile, persists raw data durably, and supports interactive analytics. The stack must use Kafka for ingestion, S3 for raw storage, and Presto for queries. The system should scale to heavy traffic, be fault-tolerant, and balance real-time and batch needs.

Assume the following to make the problem concrete:

Scale: 5–10 billion events/day, average 1 KB payload, peak 150k events/sec (burst 2× for minutes).
Latency SLOs:
- Real-time monitoring: fresh data available for exploratory queries within 2–5 minutes.
- Batch analytics: hourly/daily rollups with strong correctness.
Queries: session funnels, CTR, geo/device breakdowns, campaign attribution, and ad-hoc exploration.
Availability: multi-AZ; data durability across region-level incidents is a plus.

Requirements

Design the platform and cover:

End-to-end architecture and data flow from producers to analytics.
Storage layout and file formats in S3 for efficient Presto queries.
How Presto is configured to query the lake (metastore/catalogs, partitioning, schema evolution).
Scaling strategies for Kafka, streaming/batch jobs, S3 layout, and Presto.
Failover and recovery mechanisms across components.
Data quality, schema management, and deduplication.
Trade-offs between real-time and batch processing (freshness, cost, correctness).

Design ad clickstream analytics system

System Design: Ad Clickstream Ingestion and Analytics with Kafka, S3, and Presto

Context

Requirements

Solution

Comments (0)