Design an in-memory order matching engine
Company: Da Vinci Trading
Role: Software Engineer
Category: System Design
Difficulty: hard
Interview Round: Take-home Project
You are asked to design an in-memory **order book / order matching system** for a single electronic trading venue. The implementation language is **C++**, and the system must meet strict runtime and latency requirements (microseconds to low milliseconds per operation) under high throughput.
Work through the design step by step. For each part below, justify your choices and explain *why* your approach can meet the latency and throughput bar in C++.
### Constraints & Assumptions
Anchor your design to the following (state any additional assumptions explicitly):
- **Single venue, single machine.** The whole order book fits in RAM on one host; no distributed consensus is required for the core matcher.
- **Throughput target:** tens to hundreds of thousands of orders/second.
- **Latency target:** microsecond-scale, low-jitter per-order processing (low milliseconds is the outer bound).
- **Priority model:** strict **price–time priority** (best price first; ties broken by arrival order).
- **At minimum** LIMIT orders and cancel-by-`order_id` must be supported; other order types are an extension you should describe.
- This mirrors a real take-home from a low-latency trading firm (Optiver/Da Vinci style), C++ only, roughly a 3-hour budget — so prefer concrete, implementable structures over hand-waving.
### Clarifying Questions to Ask
A strong candidate scopes the problem before designing. Consider asking the interviewer:
1. **One symbol or many?** Is a single instance responsible for one instrument, or must one engine multiplex many symbols?
2. **Price representation:** are prices on a fixed **tick grid** (and is the tick range bounded around the current price), or arbitrary/sparse?
3. **Order-type surface:** which types are in scope now (LIMIT only?) vs. which must be *extensible* (MARKET, IOC, FOK, GTC, PostOnly)?
4. **Trade-price convention:** does a fill execute at the **resting (maker)** price or the aggressor's price? (Venue policy, and it is correctness-critical.)
5. **Durability requirement:** is this purely in-memory (rebuild from the feed on restart), or must we recover exact book + trade state after a crash?
6. **Self-trading:** is **self-match prevention** required, and which policy (cancel-resting / cancel-incoming / cancel-both)?
### What a Strong Answer Covers
```premium-lock What a Strong Answer Covers
```
### Part 1 — Core functionality
The system receives a stream of **orders**. Each order has at least:
- `order_id` (unique identifier)
- `side` (BUY or SELL)
- `price`
- `quantity`
- `timestamp` or sequence number (arrival time)
- `type` (e.g., LIMIT, MARKET; assume at minimum LIMIT must be supported)
Maintain an **order book** with two sides:
- **Bid side:** buy limit orders, sorted by descending price, then by time (price–time priority).
- **Ask side:** sell limit orders, sorted by ascending price, then by time (price–time priority).
On the arrival of a **new order**, the system must:
- Attempt to **match** it against existing orders on the opposite side according to price–time priority.
- Generate **trades** on a match — specify the fields a trade carries (e.g., `trade_id`, price, quantity, the involved order ids, timestamp).
- Handle both **partial fills** and **complete fills**.
- If the order is not fully filled and is still valid (e.g., not IOC), insert the remaining quantity into the book.
```hint Trade price
Decide up front whether a fill prints at the **resting (maker)** price or the aggressor's price — it changes correctness, not just style. Make the convention explicit in your `Trade` struct.
```
```hint Partial fills
When two orders partially match, the larger one stays in the book with reduced quantity. Ask yourself whether it should **keep its original time priority** or go to the back of the queue.
```
### Part 2 — Order types and constraints
At minimum, support:
- **Limit orders.**
- **Cancel** existing orders by `order_id`.
Then describe briefly how you would extend to support additional types (e.g., **MARKET, IOC, FOK, GTC**, and optionally PostOnly) and any constraints each implies.
```hint Framing the extension
Look for the common core: most types run the *same* matching loop and differ only in a **pre-check** (may it cross?) and the **post-match disposition** of leftover quantity (rest it, discard it, or reject).
```
```hint The tricky one
FOK is the type that doesn't fit the streaming pattern — it needs an all-or-nothing guarantee. Think about how you'd check feasibility *before* mutating the book, and why being single-threaded makes that check easy to keep atomic.
```
### Part 3 — Performance and complexity constraints
The system must support **very high throughput** (tens to hundreds of thousands of orders/second) with **low latency** per operation. State your **target time complexity** for each of:
- Insertion of a limit order.
- Matching an incoming order (scanning and removing top-of-book orders on the opposite side).
- Cancel by `order_id`.
You may assume the order book fits in memory on a single machine.
```hint Amortized view
A single large order can sweep many levels, so don't stop at the per-call worst case — argue the **amortized** cost: each resting order is created once and removed (filled or cancelled) at most once.
```
### Part 4 — Data structures and design
Propose **specific C++-oriented data structures** to:
- Maintain price levels on each side (bids/asks) in sorted order.
- Maintain **FIFO** order of orders **within** a price level.
- Support **O(1)** or near-O(1) cancellation by `order_id`.
Discuss how you would organize these structures (e.g., maps from price to queues, auxiliary hash maps for order lookup, custom allocators, object pools, etc.).
```hint Within a level
Cancel-by-`order_id` is O(1) only if, once you've found the order, you can remove it without searching its level or touching a separately allocated node. What would the order need to carry for the unlink to be pure pointer surgery?
```
```hint Across price levels
Weigh an ordered map (best bid/ask is cheap, but every level change pays a log-factor and allocates) against what you could do if the tick range were **bounded** — direct indexing buys O(1), but then ask yourself how you'd jump to the *next non-empty* level without scanning.
```
```hint Jitter, not just Big-O
At microsecond scale the worst-case Big-O often isn't the tail-latency villain — **what you do on the hot path that the asymptotics hide** is. Ask which routine operations touch the general-purpose heap, and how you'd arrange for a steady-state add or cancel to allocate nothing.
```
### Part 5 — Concurrency and threading
Assume the system may receive orders from multiple network threads. Describe how you would make the core matching logic **thread-safe** and low-contention **while preserving strict price–time ordering**. Consider options such as:
- A single-threaded matching engine fed by message queues.
- Sharding per symbol.
- Lock-free queues or carefully scoped locking.
```hint Where parallelism belongs
There's a tension to resolve: parallel matching could add throughput, but it fights determinism and strict price–time ordering, and locks land on the hot path. Decide what you're willing to give up for latency — then ask whether the *unit* you parallelize across has to be the same as the unit you match within.
```
```hint Queue mechanics
If one producer feeds one engine, an **SPSC ring buffer** is fastest. Watch for **false sharing** — keep producer/consumer cursors on separate cache lines.
```
### Part 6 — Failure handling and robustness
- What happens on process restart? Do you need to recover the order book state? If so, how might you snapshot or log state efficiently?
- How do you ensure each order is **processed exactly once** and that **trades are not duplicated** when recovering?
```hint Recovery primitives
You can't fsync-to-disk on every order without wrecking the latency bar, yet a crash must not lose the book. What property does your single-threaded engine already have — given a fixed input order — that lets you persist *less* than the full state continuously and still reconstruct it exactly? Reconcile the durability cost against the hot path.
```
```hint Exactly-once
Recovery will re-feed inputs that were already applied, and it must not double-emit trades. Think about what minimal marker on the input stream lets you tell "already processed" from "new," and a separate marker on the output stream so you don't re-publish fills a downstream consumer already saw.
```
### Part 7 — API and output
Sketch a simple API for the engine (C++ interfaces or function signatures) to:
- Submit a new order.
- Cancel an order.
- Query the current top-of-book (best bid/ask) and possibly full depth.
Show what the engine **outputs** when matches occur (trade events, order-status updates, etc.).
```hint Query consistency
Reads must not race a half-applied match. Either **service queries on the engine thread** between messages, or publish a **double-buffered, seq-stamped snapshot** that off-thread readers load lock-free (accepting a few messages of staleness).
```
### Part 8 — Complexity analysis
Analyze the **time and space complexity** of your design:
- Average and worst-case complexities for **add**, **match**, **cancel**, and **query best bid/ask**.
- Discuss the **trade-offs** you made between simplicity and performance.
```hint Make the trade-off explicit
Present *two* designs (e.g. `std::map` vs. array+bitmap) side by side in a table, and pick one **for this latency bar** — naming what each costs (log-factor & allocation jitter vs. bounded memory and a fixed tick window).
```
---
### Follow-up Questions
Be ready for deeper probes after your main design:
1. **Scale-up:** how does the design change if a single symbol's order rate is 10–100× higher, or if one engine must handle thousands of symbols? What breaks first?
2. **Self-match prevention:** walk through your STP policy and prove the matching loop still **terminates** when the aggressor and the head maker share an account.
3. **FOK under self-liquidity:** if the incoming account already has resting orders in the crossable range, why might a naive feasibility check (summing level totals) let a FOK pass and then under-fill — and how do you fix it?
4. **Numeric correctness:** why ban floating-point prices on the matching path, and how do you represent price exactly instead?
Explain your design choices step-by-step and justify why your approach can meet strict runtime and latency requirements in C++.
Quick Answer: This question evaluates expertise in high-performance trading system design, covering in-memory order books, latency-sensitive order matching, efficient C++ data structures, correctness under price-time priority, and concurrency and complexity analysis.