Design a metrics platform (like a simplified Prometheus/Datadog metrics product) that supports collecting, storing, querying, and visualizing metrics.
Scope
-
Ingest time-series metrics from many hosts/services.
-
Store metrics with tags/labels (dimensions).
-
Support queries for dashboards (instant queries and time-range queries).
-
No alerting/notification system is required.
Clarifications to cover
-
Data model (counters, gauges, histograms; labels/tags)
-
Ingestion model (push vs pull)
-
Storage (hot vs cold, retention, downsampling)
-
Query API and typical query patterns
-
Scale assumptions and bottlenecks (write QPS, cardinality)
-
Multi-tenancy, authn/authz, and isolation
-
Reliability and cost controls