Design a Metrics Storage Platform
Company: Okta
Role: Software Engineer
Category: System Design
Difficulty: medium
Interview Round: Technical Screen
Design a distributed backend for storing and querying application metrics.
The system should collect time-series metrics from many services running across multiple machines and regions. Each metric sample contains:
- metric name
- labels or tags such as service, host, region, and environment
- timestamp
- numeric value
Requirements:
- high write throughput for continuous metric ingestion
- support counters, gauges, and histogram-like metrics
- allow users to query by metric name, label filters, and time range
- power dashboards and alerting with low latency for recent data
- keep recent data fast to query while retaining historical data cheaply
- scale horizontally as the number of services and series grows
- tolerate machine and zone failures
- support retention, compaction, and downsampling
- optionally support multi-tenant isolation and rate limits
Describe the API, storage model, sharding strategy, ingestion path, query path, failure handling, and major trade-offs.
Quick Answer: This question evaluates system-design and distributed-systems skills for time-series metrics storage, focusing on data modeling for counters, gauges, and histograms, ingestion and query pipelines, API design, sharding and storage formats, retention/compaction, availability, and multi-tenant considerations.