System Design: Real-Time Multi-Exchange Stock Data Ingestion and Live Web Page
Context
You are asked to design a production system that continuously ingests stock market data from multiple exchanges and renders an always up-to-date web page for users. Data includes trades and quotes (top-of-book), and the system must support historical queries and real-time streaming updates.
Assume: multiple heterogeneous data sources (direct exchange feeds and/or vendor APIs), variable market hours, exchange-specific schemas, intermittent network failures, and strict rate limits.
Requirements
Design the system and cover the following areas:
-
Data sources
-
Polling vs. streaming criteria and trade-offs
-
Connectors to multiple exchanges/vendors
-
Schema normalization
-
Canonical symbol mapping and data model for trades/quotes/book updates
-
Deduplication and idempotency
-
Event identity across venues/providers
-
Time handling and ordering
-
Time zones, trading sessions, and out-of-order events
-
Caching and storage
-
Hot cache, time-series storage, long-term archive
-
API design
-
Historical REST APIs and real-time streaming (WebSocket/SSE)
-
Client updates
-
Subscription, snapshot + delta model, resumability
-
Consistency and latency targets
-
SLAs/SLOs, read semantics
-
Fault tolerance and backfill
-
Retries, DLQs, replays, historical gap filling
-
Rate limiting
-
Upstream (exchanges) and downstream (clients)
-
Scaling and partitioning
-
Per-symbol/venue partitioning and horizontal scale
-
Observability
-
Metrics, logs, tracing, data quality checks
-
Security and compliance
-
Auth, transport security, secrets, market data entitlements
-
Cost considerations
-
Storage/compute trade-offs, data retention, egress
State assumptions where needed and justify design choices.