PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/System Design/Citadel

Design a low-latency trading system

Last updated: Mar 29, 2026

Quick Overview

This question evaluates expertise in designing low-latency, high-determinism electronic trading systems, covering order matching, market data ingestion and publish, partitioning/sharding, persistence for auditability, idempotency, pre-trade risk controls, and operational reliability.

  • hard
  • Citadel
  • System Design
  • Software Engineer

Design a low-latency trading system

Company: Citadel

Role: Software Engineer

Category: System Design

Difficulty: hard

Interview Round: Technical Screen

Design an electronic trading platform that supports equities trading with limit and market orders. Specify how you would: ( 1) maintain per-symbol order books and implement a price–time priority matching engine; ( 2) ingest and normalize market data from multiple venues and publish snapshots plus incremental updates with bounded skew; ( 3) expose authenticated APIs for order entry (REST/gRPC) and a low-latency streaming protocol for market data; ( 4) perform pre-trade risk checks (credit/position/price bands), ensure idempotent submissions, and provide exactly-once execution reports; ( 5) achieve P99 end-to-end acknowledgement under 5 ms within one region at 200k orders/second peak, and scale horizontally for bursts; ( 6) persist state with an append-only log or event sourcing, support deterministic replay/backtesting, and meet strict auditability; ( 7) handle partial fills, cancels, mass cancels, auctions, market halts, and symbol pauses; ( 8) design partitioning/sharding of order books, concurrency control inside the matcher, recovery and disaster recovery (RPO ≈ 0, RTO < 1 minute); ( 9) address fault tolerance, time synchronization and clock skew (e.g., PTP), fairness across partitions, backpressure, and flow control; ( 10) define monitoring, SLOs, capacity planning, and strategies to test latency and correctness under failures.

Quick Answer: This question evaluates expertise in designing low-latency, high-determinism electronic trading systems, covering order matching, market data ingestion and publish, partitioning/sharding, persistence for auditability, idempotency, pre-trade risk controls, and operational reliability.

Related Interview Questions

  • Design alerting for application-to-exchange mappings - Citadel (medium)
  • Design a low-latency trading platform - Citadel (hard)
  • Design stock price time-series store and query - Citadel (easy)
  • Discuss queues, NoSQL, and concurrency - Citadel (hard)
|Home/System Design/Citadel

Design a low-latency trading system

Citadel logo
Citadel
Aug 9, 2025, 12:00 AM
hardSoftware EngineerTechnical ScreenSystem Design
61
0

System Design: Low-Latency Electronic Trading Platform (Equities)

You are designing a single-region electronic trading platform (exchange/ATS-like) that supports market and limit orders for equities. Clients are co-located in the same region, and the system must deliver high determinism, strict auditability, and robust failure handling.

Work through the parts below. Each part is a distinct design area; a strong answer treats them as one coherent system rather than ten isolated features — call out where a decision in one part constrains another (for example, how ordering in Part 1 relates to replay in Part 6, or how the durability choice in Part 6 spends the latency budget in Part 5).

Constraints & Assumptions

Anchor your design to these figures (state any you'd renegotiate with the interviewer):

  • Throughput: 200,000 orders/second steady-state peak; bursts may exceed peak.
  • Latency SLO: P99P99P99 end-to-end acknowledgement <5 ms< 5\text{ ms}<5 ms within one region.
  • Recovery: RPO≈0\text{RPO} \approx 0RPO≈0 , RTO<1 minute\text{RTO} < 1\text{ minute}RTO<1 minute .
  • Instrument: equities; tick size \ 0.01$; orders are day-only (extend order types as needed).
  • Clients: authenticated institutional participants, co-located in the same region.
  • Topology: one production region with multiple availability zones; an optional warm-standby region for disaster recovery.
  • Priorities: determinism, auditability, and correctness under failure are non-negotiable; latency is the lever that may degrade gracefully under overload.

Clarifying Questions to Ask

Before designing, scope the problem with the interviewer:

  • Is this the venue/matching engine itself (system of record for the book) or a buy-side client connecting to an exchange? (These are very different systems.)
  • What order types and time-in-force must be supported beyond plain market/limit — IOC, FOK, Post-Only, stop orders? Are amends/replaces in scope?
  • Are clients truly co-located (cross-connect, kernel-bypass on the hot path), or must we also serve WAN participants?
  • For RPO≈0, must durability be synchronous across availability zones , or is AZ-local-synchronous with async cross-AZ replication acceptable? (This single choice has a large effect on the latency budget.)
  • What regulatory regime governs price bands and audit (e.g. LULD-style limit-up/limit-down, clearly-erroneous rules, reconstruction requirements)?
  • Is the 5 ms P99P99P99 measured gateway-in to ack-out , and does it include the durability commit?

What a Strong Answer Covers Premium

Part 1 — Order Books & Price–Time Priority Matching

  • Maintain a separate order book per symbol.
  • Implement a price–time priority matching engine (FIFO within each price level).
  • Specify the data structures, the matching algorithm, and how you handle the tricky cases (e.g. execution price, market orders into a thin book, amends, self-trades).

Part 2 — Market Data: Ingest, Normalize & Publish

  • Ingest and normalize market data from multiple external venues into a canonical schema.
  • Publish your platform's own market data — snapshots and incremental updates — with bounded skew across symbols/shards.
  • Define how a consumer joins the stream and recovers from a gap.

Part 3 — Client Interfaces

  • Expose authenticated APIs for order entry (REST and/or gRPC).
  • Provide a low-latency streaming protocol for market data.
  • Justify which transport belongs on the hot path vs. the control plane.

Part 4 — Risk, Idempotency & Execution Reports

  • Perform pre-trade risk checks : credit/exposure, position, and price bands.
  • Ensure idempotent order submissions.
  • Provide exactly-once execution reports to clients.

Part 5 — Performance & Scale

  • Achieve P99P99P99 end-to-end acknowledgement <5 ms< 5\text{ ms}<5 ms within one region at 200k orders/second peak.
  • Scale horizontally to handle bursts beyond the steady-state peak.
  • Present a per-stage latency budget and name the mechanism behind each line.

Part 6 — Persistence, Replay/Backtest & Auditability

  • Persist state via an append-only log or event sourcing .
  • Support deterministic replay/backtesting and strict audit trails.
  • Be precise about which part of the pipeline is bit-identical on replay and which is not.

Part 7 — Trading Scenarios & Order Lifecycle

  • Correctly handle partial fills, cancels, mass cancels, auctions, market halts, and symbol pauses .
  • Sketch the order state machine and how each event flows through the system deterministically.

Part 8 — Partitioning, Concurrency, Recovery & DR

  • Design the partitioning/sharding of order books.
  • Define concurrency control inside the matcher.
  • Support recovery and disaster recovery ( RPO≈0\text{RPO} \approx 0RPO≈0 , RTO<1 minute\text{RTO} < 1\text{ minute}RTO<1 minute ).

Part 9 — Fault Tolerance, Time Sync, Fairness & Flow Control

  • Address fault tolerance , time synchronization and clock skew (e.g. PTP), fairness across partitions , backpressure , and flow control .

Part 10 — Operability: Monitoring, SLOs, Capacity & Testing

  • Define monitoring, SLOs, capacity planning , and strategies to test latency and correctness under failure .

Follow-up Questions

Expect the interviewer to push deeper once the core design is on the board:

  • The durability commit dominates your latency budget. Walk me through exactly what changes if RPO≈0 must be synchronous across availability zones versus AZ-local-synchronous with async replication — and which would you actually ship for a regulated exchange?
  • Symbol volume is heavily skewed (a few tickers dominate). How does your sharding scheme avoid a hot shard, and what do you do when one symbol alone saturates a core?
  • A matcher process crashes mid-burst at 1.5× peak. Trace the recovery: what's the client-visible behavior, and how do you guarantee zero lost or duplicated executions ?
  • Can you give every co-located participant identical latency to the matcher? If not, what can you actually promise about fairness, and where does path asymmetry leak in?

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Citadel•More Software Engineer•Citadel Software Engineer•Citadel System Design•Software Engineer System Design

Your design canvas — auto-saved

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.