Scenario
Design a service that crawls the U.S. National Weather Service (or a similar public provider) and provides hourly weather data to internal consumers (and optionally an external API).
Requirements
Functional
-
Fetch weather data
hourly
for all supported locations (e.g., stations or grid points).
-
Normalize provider responses into an internal schema.
-
Store historical data for querying and analytics.
-
Expose data to clients via an API (or data export) with predictable freshness.
Non-functional
-
Respect provider constraints (rate limits, robots/terms).
-
High reliability with retries/backoff.
-
Idempotent ingestion (avoid duplicates).
-
Monitoring and alerting on missing/late data.
Assumptions (you may refine)
-
10,000 locations.
-
Each location updates once per hour.
-
Provider endpoints may be flaky and occasionally return partial data.
Deliverables
-
Architecture for scheduling and crawling
-
Storage and schema
-
Freshness guarantees and backfill strategy
-
Rate limiting, retries, and idempotency
-
Monitoring, alerting, and data quality checks