Crypto Trading And Order Routing Systems
Asked of: Software Engineer
Last updated
What's being tested
These prompts test whether you can design a low-latency, reliable trading system that accepts user orders, routes them to one or more venues, tracks lifecycle state, and exposes real-time updates without losing or duplicating money-moving actions. Coinbase cares because trading infrastructure must handle bursty demand, third-party exchange failures, inconsistent market data, and strict correctness expectations while still meeting tight p99 latency targets. The interviewer is probing for practical distributed-systems judgment: idempotency, state-machine modeling, reconciliation, backpressure, rate limiting, and clear API boundaries between order intake, routing, execution, and streaming updates.
Core knowledge
-
Order lifecycle modeling is central. Represent orders as a finite state machine:
NEW -> ACCEPTED -> ROUTED -> PARTIALLY_FILLED -> FILLED, with terminal states likeCANCELED,REJECTED, andEXPIRED. Validate allowed transitions server-side so duplicate fills or late cancels cannot corrupt state. -
Idempotency keys prevent duplicate trades when clients retry after timeouts. A
POST /ordersrequest should includeclient_order_id; store(user_id, client_order_id)with a unique constraint inPostgresor another transactional store, and return the original result on repeat submission. -
Event sourcing works well for auditability. Append immutable events such as
OrderAccepted,OrderRouted,ExecutionReportReceived, andBalanceReserved; derive current state from events or maintain a materializedorderstable. This makes replay, debugging, and regulatory audit trails much easier. -
Balance reservation must be atomic with order acceptance. For a buy, reserve quote currency; for a sell, reserve base asset. Use a transactional ledger pattern:
available = total - reserved, and reject orders whenavailable < required_amount + fees. -
Third-party exchange integration is asynchronous and unreliable. External venues may return
HTTP 200before final execution, drop WebSocket messages, impose per-symbol rate limits, or send execution reports out of order. Treat venue APIs as eventually consistent and reconcile against their authoritative order endpoints. -
Reconciliation jobs repair missed events. Periodically compare internal state against exchange state using
GET /orders/{id}or venue batch endpoints. Reconciliation should be idempotent: if an internal order isROUTEDbut venue reportsFILLED, emit the missing fill event rather than mutating state blindly. -
Outbox pattern avoids dual-write bugs. If you write an order to
Postgresand publish toKafka, wrap the order insert andoutbox_eventsinsert in one transaction; a relay later publishes toKafka. This prevents “DB commit succeeded but publish failed” inconsistencies. -
Market data ingestion needs normalization and ordering. Different exchanges encode symbols, timestamps, sequence numbers, bids, asks, and trade ticks differently. Normalize to canonical fields like
venue,product_id,event_time,receive_time,sequence,bid_px,ask_px, and deduplicate using(venue, product_id, sequence). -
Streaming architecture usually separates hot-path ingestion from fanout. Use exchange WebSocket clients to ingest ticks, publish normalized events to
KafkaorNATS, maintain latest-book snapshots inRedisor memory, and push client updates over WebSocket/SSE. Avoid one exchange connection per user. -
Latency budgets should be explicit. If target order acknowledgment is
p99 < 200ms, budget roughly: API auth20ms, validation/reservation50ms, persistence30ms, enqueue/routing50ms, overhead50ms. For execution, distinguish user acknowledgment latency from venue fill latency. -
Rate limiting and backpressure protect both your system and external venues. Apply per-user limits at the API gateway, per-symbol limits in order routing, and venue-specific token buckets. When queues exceed thresholds, degrade predictably: reject new orders, pause low-priority market data, or serve stale-but-labeled prices.
-
Consistency choices depend on operation type. Order acceptance and balance reservation need strong consistency; market price display can tolerate eventual consistency and dropped intermediate ticks if the latest value is correct. A common rule: never be eventually consistent about user funds, but allow eventual consistency for read-only price feeds.
Worked example
For Design cryptocurrency trading with third-party exchanges, start by clarifying scope: “Are we only routing user market/limit orders, or also maintaining custody and balances? What latency and availability targets matter? How many venues and symbols? Do we need best-execution routing or just reliable routing?” Then declare assumptions: support BTC-USD-style spot trading, thousands of orders per second, external exchanges connected over REST/WebSocket, and correctness over ultra-low latency.
Organize the answer around four pillars: order intake, routing/execution, state management, and reconciliation/observability. In order intake, propose POST /orders, DELETE /orders/{id}, and GET /orders/{id}, all backed by idempotency keys and balance reservation. In routing, explain a venue adapter interface like placeOrder(), cancelOrder(), and getOrderStatus(), with per-venue rate limits and retry policies. In state management, describe an order state machine persisted through immutable events, with transitions triggered by user requests and exchange execution reports. In reconciliation, add a periodic scanner that compares internal orders against venue status and emits missing events to close gaps.
A strong tradeoff to flag is synchronous versus asynchronous order placement. Synchronously waiting for the exchange before acknowledging the user gives clearer semantics but couples your API latency to third-party failures; acknowledging after internal acceptance gives better availability but requires clear user-visible states like ACCEPTED versus ROUTED. Close by saying that, with more time, you would go deeper on ledger accounting, disaster recovery, and how to safely deploy venue adapter changes using shadow traffic and replay tests.
A second angle
For Design real-time stock price viewer, the same building blocks apply, but the correctness bar shifts from money movement to freshness, ordering, and fanout scalability. You would spend less time on balance reservation and order state machines, and more time on market data ingestion, deduplication, aggregation, and client delivery over WebSocket. The main state is not “order lifecycle” but “latest price/book per symbol,” often cached in memory or Redis for fast reads. The key tradeoff is whether to deliver every tick or coalesce updates, for example sending at most 10 updates per second per symbol to reduce client and network load. Reconciliation still matters, but it means resyncing snapshots after sequence gaps rather than repairing order fills.
Common pitfalls
Pitfall: Treating third-party exchange APIs as reliable local function calls.
A tempting but weak answer is “call the exchange, store the result, and return it to the user.” That ignores timeouts, duplicate submissions, partial fills, late execution reports, and exchange-side outages. A better answer explicitly separates internal acceptance from external execution and uses idempotency, retries with bounded backoff, and reconciliation.
Pitfall: Mixing all consistency requirements into one architecture.
Candidates often overbuild market data with strong transactional guarantees or underbuild order handling with eventually consistent writes. The sharper framing is to classify flows: funds and order state require strong consistency and auditability, while price streaming can favor low latency, deduplication, and eventual convergence.
Pitfall: Staying too high-level and never naming failure modes.
Saying “use microservices, Kafka, and Redis” is not enough. Interviewers want to hear how the design behaves when Kafka is delayed, a venue sends duplicate fills, a WebSocket drops sequence numbers, or a user retries POST /orders after a client timeout. Name the failure, then name the mechanism that contains it.
Connections
Interviewers can pivot from here into ledger design, matching engines, WebSocket fanout, distributed transactions, or event-driven architecture. They may also ask for deeper treatment of Kafka partitioning, exactly-once versus at-least-once delivery, API rate limiting, or how to debug a production incident where displayed balances diverge from executed fills.
Further reading
-
Designing Data-Intensive Applications — Martin Kleppmann’s book is the best practical foundation for replication, logs, consistency, and stream processing tradeoffs.
-
Stripe Idempotent Requests — Clear real-world pattern for safely retrying externally visible API requests.
-
Martin Fowler: Event Sourcing — Concise explanation of modeling state changes as immutable events, useful for order lifecycle and audit systems.
Featured in interview prep guides
Practice questions
- Design Crypto Order RoutingCoinbase · Software Engineer · Onsite · hard
- Implement a crypto order management systemCoinbase · Software Engineer · Onsite · hard
- Design a crypto trading web frontendCoinbase · Software Engineer · Onsite · hard
- Design crypto trading order control APICoinbase · Software Engineer · Technical Screen · medium
- Design real-time stock price viewerCoinbase · Software Engineer · Onsite · hard
- Design crypto trading systemCoinbase · Software Engineer · Onsite · hard
- Design cryptocurrency trading platformCoinbase · Software Engineer · Onsite · hard
- Design cryptocurrency trading with third-party exchangesCoinbase · Software Engineer · Onsite · hard
- Design a crypto trading platformCoinbase · Software Engineer · Onsite · hard
- Design real-time exchange data sync systemCoinbase · Software Engineer · Onsite · hard
Related concepts
- Donation And Payment PlatformsSystem Design
- Payment Processing And Ledger SystemsSystem Design
- API Integration And External Service DesignSystem Design
- Auctions, Ticketing, And Real-Time MessagingSystem Design
- Limit Order Book Price-Time MatchingCoding & Algorithms
- Scalable Distributed System ArchitectureSystem Design