System Design: Historical FX-Rate Service
Design an internal service for engineers and analysts to fetch historical currency exchange rates for analytics, backfills, and financial reporting. The service should support roughly 10k read QPS, occasional corrections and backfills, high availability, low latency, and cost efficiency.
Constraints & Assumptions
-
Historical rates are mostly immutable but can be corrected.
-
Consumers include internal services, analysts, dashboards, and batch data pipelines.
-
Precision, snapshot semantics, auditability, and versioning matter for financial use cases.
-
State assumptions for currencies, pairs, granularity, retention, and latency targets.
Clarifying Questions to Ask
-
What granularity is required: daily, hourly, minute, tick, or all of these?
-
Do callers need bid, ask, mid, OHLC, or conversion endpoints?
-
What is the required P95 latency and availability SLO?
-
How often do corrections happen and do consumers need as-of historical snapshots?
Part 1 - APIs
Define read endpoints and optional ingestion endpoints.
What This Part Should Cover
-
Point-in-time rate, time-series rates, conversion, OHLC or aggregate endpoints, and internal ingest.
-
Versioning, pagination, auth, rate limits, idempotency, error handling, precision, and response contracts.
-
REST or gRPC trade-offs for internal callers.
Part 2 - Data Model
Design the entities and fields.
What This Part Should Cover
-
Currency, currency pair, timestamp, granularity, price type, scaled integer rate, source, version, as-of time, quality flags, and provenance.
-
Indexes and keys for pair and time-range access.
-
Correction handling with immutable versions and audit history.
Part 3 - Storage and Caching
Choose hot storage, cold storage, partitioning, replication, retention, and caching strategy.
What This Part Should Cover
-
Wide-column, time-series SQL, object storage, Parquet, hot versus cold paths, partitioning by pair and time, and downsampling.
-
In-process, distributed, and precomputed caches; cache keys, TTLs, invalidation, and correction fanout.
-
Cost-efficiency trade-offs.
Part 4 - Consistency, Scalability, and Failure Handling
Explain freshness, snapshot semantics, scaling, multi-region design, and failure modes.
What This Part Should Cover
-
Strong versus eventual consistency, as-of reads, correction propagation, and backfill behavior.
-
Capacity estimates for 10k QPS, autoscaling, read replicas, multi-region failover, and provider outages.
-
Degraded modes and client guidance.
Part 5 - Monitoring and Operations
Define SLIs, SLOs, alerts, tracing, data quality checks, and runbooks.
What This Part Should Cover
-
Latency, availability, error rate, cache hit rate, freshness, correction lag, ingestion lag, and data quality metrics.
-
Reconciliation against providers, anomaly detection, missing data alerts, and audit reports.
-
Incident runbooks for stale data, bad corrections, provider outages, and regional failures.
What a Strong Answer Covers
-
Financial precision and audit semantics.
-
A serving design that handles hot reads and rare corrections.
-
Clear caching and invalidation strategy.
-
Operational controls for correctness, freshness, and reliability.
Follow-up Questions
-
How would you invalidate caches after a correction?
-
What if two providers disagree?
-
How would you support snapshot-consistent backfills?
-
What is the hot key risk for USD/EUR?
-
Which SLO matters most for analysts versus online services?