How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

What difficulty level is this interview question?

This is a medium difficulty System Design question, commonly asked during Technical Screen rounds at Rippling.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Rippling during technical interviews.

Design a User Behavior Tracking (Clickstream Analytics) System

Q: Design a User Behavior Tracking (Clickstream Analytics) System

This question asks for the design of an end-to-end user behavior tracking (clickstream analytics) system, covering event collection, transport, processing, storage, and querying. It evaluates system design skills around high-throughput data pipelines, including trade-offs between streaming and batch processing, storage models for aggregate versus per-user access patterns, and delivery guarantees under unreliable clients.

Design a user behavior tracking system for a web and mobile product. The system captures user interaction events, transports them reliably, processes them, stores them, and lets product/analytics teams answer questions such as:

"How many users completed checkout yesterday?"
"What is the funnel drop-off between the add to cart step and the payment step?"
"Show me the ordered sequence of actions a specific user took in the last hour."

Client applications (a JavaScript web SDK, plus iOS and Android SDKs) emit events such as page_view, button_click, add_to_cart, and purchase. Each event carries at least: an event name, a user id (or an anonymous device id for logged-out users), a timestamp, a session id, and a free-form JSON property bag (e.g. { "page": "/cart", "item_id": "abc", "price": 19.99 }).

Design the end-to-end system: how events are collected from clients, transported, validated/enriched, stored, and queried, plus how it scales, stays reliable, and is observed. Walk through both the write path (event ingestion) and the read path (powering dashboards and ad-hoc queries).

Constraints & Assumptions

Scale (assumed): ~50M daily active users, ~100 events/user/day ⇒ ~5B events/day ≈ ~58K events/sec average , with diurnal/peak bursts up to ~300K events/sec .
Event size: ~1 KB average ⇒ ~5 TB/day of raw event data.
Retention: raw events kept ~30–90 days; rolled-up aggregates kept for years.
Latency targets: event ingestion accepts in <100 ms p99 at the client edge; dashboards/aggregates can be near-real-time (fresh within seconds to a couple of minutes); ad-hoc analytical queries may take seconds to minutes.
Reliability: ingestion availability ~99.9%; at-least-once delivery is acceptable as long as duplicates can be removed for accurate counts; small, bounded event loss under extreme failure is tolerable but should be measured.
Consistency: analytics is eventually consistent — exact real-time accuracy is not required, but counts must converge to correct values.
Out of scope unless raised: experimentation/AB assignment logic, billing, and GDPR data-subject deletion mechanics (acknowledge them, don't fully design them).

Clarifying Questions to Ask

Freshness vs. cost: Do dashboards need sub-minute freshness (streaming aggregation) or is hourly/daily batch roll-up acceptable? This drives the entire processing layer.
Query surface: Are queries restricted to a fixed set of pre-defined metrics/funnels, or must we support arbitrary ad-hoc SQL-like exploration over raw events?
Identity: How are anonymous device ids stitched to user ids after login (identity resolution), and do we need to retroactively re-attribute pre-login events?
Delivery guarantees: Is occasional event loss acceptable, or must every event be durably captured? How accurate must counts be (exact vs. approximate, e.g. HyperLogLog)?
Schema governance: Is the event property schema free-form forever, or is there a registry/contract so analysts can trust field types?
Privacy/compliance: What PII may events contain, what's the retention/deletion policy, and do we need regional data residency?

What a Strong Answer Covers Premium

Follow-up Questions

How do you compute a funnel conversion rate (e.g. view → add_to_cart → purchase within 24 hours per user) efficiently over billions of events? Compare a streaming approach against a daily batch job.
A mobile client is offline for two hours, then flushes a backlog of events with old timestamps. How does your processing layer produce correct time-bucketed aggregates despite this late, out-of-order data?
A buggy app release suddenly doubles event volume and starts sending malformed payloads. Walk through how the system absorbs the load and isolates the bad data without corrupting downstream metrics or dropping good events.
How would you support arbitrary ad-hoc queries from analysts without letting an expensive query take down the cluster serving production dashboards?

Design a User Behavior Tracking (Clickstream Analytics) System

Quick Overview

Design a User Behavior Tracking (Clickstream Analytics) System

Constraints & Assumptions

Clarifying Questions to Ask

What a Strong Answer Covers Premium

Follow-up Questions

Submit Your Answer to Earn 20XP

Design a User Behavior Tracking (Clickstream Analytics) System

Quick Overview

Design a User Behavior Tracking (Clickstream Analytics) System

Constraints & Assumptions

Clarifying Questions to Ask

What a Strong Answer Covers Premium

Follow-up Questions

Submit Your Answer to Earn 20XP