System Design: Weather API with 10-Minute Freshness SLA
Context
You are designing a service that exposes a public HTTP API to return the current temperature for a location. An external Weather Service provides an hourly dump covering approximately 2,000 U.S. locations via their API. The business requires that clients see the latest provider update within 10 minutes of when the provider publishes it.
Assume the external provider updates data on an hourly cadence (e.g., top of the hour) and exposes either a single hourly dump endpoint or per-location endpoints. Your service must ingest, store, and serve this data with a strong freshness SLA, support caching, and be resilient.
Requirements
-
API
-
Implement GET /weather?location={id|lat,lon} to return the current temperature.
-
Provide an API contract, versioning strategy, example responses, and error handling.
-
Define how lat,lon is mapped to a known location or nearest coverage.
-
Freshness SLA (10 minutes)
-
Ensure data served is at most 10 minutes stale relative to the provider’s publish time.
-
Describe how to meet this with caches, TTLs, invalidation, and backfilling.
-
Ingestion Pipeline
-
Describe how to ingest the provider’s hourly data (polling schedule, rate limits).
-
Include retries with backoff, idempotency, deduplication, and backfilling of missed updates.
-
Data Model and Storage
-
Propose schemas for locations and weather observations.
-
Discuss storage choices, indexing, and retention of historical data.
-
Scaling and Partitioning
-
Provide rough QPS and storage estimates and how the system scales (compute, DB, caches).
-
Partitioning approach for reads/writes and for growth beyond 2,000 locations.
-
Monitoring, Alerting, and Reliability
-
Define key metrics, dashboards, and alert thresholds for freshness, errors, and latency.
-
Failure handling strategies and latency budgets.
-
Multi-Region Availability and Disaster Recovery
-
Discuss active-active or active-passive design, replication, failover, RPO/RTO, and testing.