Scenario
Design an analytics dashboard for a ChatGPT-like AI chat product.
The dashboard is used by Product/Engineering/Support to monitor product health and usage.
Requirements
Core use cases
-
Near real-time monitoring of:
-
Usage: DAU/WAU/MAU, sessions, messages per user, retention cohorts
-
Performance: end-to-end latency (P50/P95/P99), streaming token latency, time-to-first-token
-
Reliability: error rate, timeouts, rate limits
-
Quality proxies: user feedback (thumbs up/down), conversation abandonment
-
Cost: tokens generated, $ cost by model/tenant
-
Slice metrics by dimensions: model, region, platform, tenant, app version.
-
Support
time ranges
(last 15m, 1h, 24h, custom) and
drill-down
.
Non-functional
-
Data freshness targets:
-
Operational metrics: ~1–5 minutes
-
Business/retention metrics: hourly/daily acceptable
-
High availability; dashboard should degrade gracefully.
-
Privacy/security: PII handling, access control, audit logs.
Deliverable
Propose an end-to-end system design: instrumentation, ingestion, storage, aggregation, query patterns, and the dashboard serving layer.