Design an Analytics Warehouse for an E‑commerce Platform
Context
An e‑commerce platform exposes the following user-facing services:
-
Item search
-
Item detail page view
-
User login
-
Add to cart
-
View cart
-
Purchase/checkout
-
View order
The analytics warehouse must support pre-authored queries including:
-
In the current calendar month, how many unique users in Asia viewed a specified item A?
-
Right now, what are the three orders with the highest total amount?
Requirements
Design an analytics warehouse that specifies:
-
Event schema (fields, data types, primary/foreign keys).
-
Session model and how a tracking/session ID is generated.
-
User identity stitching from anonymous to logged-in sessions.
-
How to record event objects and quantities (e.g., cart contents and order totals).
-
Whether to store events as semi-structured JSON vs normalized columns and the trade-offs.
-
Partitioning and indexing strategy.
-
Deduplication and idempotency.
-
Handling late/out-of-order events.
-
Dimensional modeling (users, items, geography) and slowly changing dimensions.
-
Ingestion/processing pipeline (batch vs streaming) with scaling and cost considerations.
Optionally, provide example queries your design would enable for the two questions above.