Sort trade executions into a canonical order
Company: Jane Street
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: easy
Interview Round: Onsite
Quick Answer: This question evaluates a candidate's ability to enforce deterministic ordering and data normalization of time-series trade execution records, testing competencies in multi-key sorting, tie-breaking logic, and performance with large inputs.
Part 1: Sort trade executions into canonical order
Constraints
- 0 <= len(trades) <= 200000
- `trade_id` values are unique
- `qty` > 0 for every trade
- `side` is either `"B"` or `"S"`
- Timestamps and prices fit in standard integer ranges
Examples
Input: [{'trade_id': 'T3', 'symbol': 'MSFT', 'ts': 1002, 'price': 330, 'qty': 5, 'side': 'S'}, {'trade_id': 'T1', 'symbol': 'AAPL', 'ts': 1000, 'price': 150, 'qty': 10, 'side': 'B'}, {'trade_id': 'T2', 'symbol': 'AAPL', 'ts': 1001, 'price': 151, 'qty': 3, 'side': 'S'}]
Expected Output: [{'trade_id': 'T1', 'symbol': 'AAPL', 'ts': 1000, 'price': 150, 'qty': 10, 'side': 'B'}, {'trade_id': 'T2', 'symbol': 'AAPL', 'ts': 1001, 'price': 151, 'qty': 3, 'side': 'S'}, {'trade_id': 'T3', 'symbol': 'MSFT', 'ts': 1002, 'price': 330, 'qty': 5, 'side': 'S'}]
Explanation: The records are primarily ordered by timestamp: 1000, then 1001, then 1002.
Input: [{'trade_id': 'T2', 'symbol': 'MSFT', 'ts': 1000, 'price': 330, 'qty': 2, 'side': 'B'}, {'trade_id': 'T1', 'symbol': 'AAPL', 'ts': 1000, 'price': 150, 'qty': 1, 'side': 'S'}, {'trade_id': 'T3', 'symbol': 'AAPL', 'ts': 1000, 'price': 151, 'qty': 1, 'side': 'B'}]
Expected Output: [{'trade_id': 'T1', 'symbol': 'AAPL', 'ts': 1000, 'price': 150, 'qty': 1, 'side': 'S'}, {'trade_id': 'T3', 'symbol': 'AAPL', 'ts': 1000, 'price': 151, 'qty': 1, 'side': 'B'}, {'trade_id': 'T2', 'symbol': 'MSFT', 'ts': 1000, 'price': 330, 'qty': 2, 'side': 'B'}]
Explanation: All timestamps are equal, so sort by symbol. Within `AAPL`, sort by `trade_id`.
Input: []
Expected Output: []
Explanation: An empty input should return an empty list.
Input: [{'trade_id': 'X1', 'symbol': 'TSLA', 'ts': 9999, 'price': 250, 'qty': 7, 'side': 'B'}]
Expected Output: [{'trade_id': 'X1', 'symbol': 'TSLA', 'ts': 9999, 'price': 250, 'qty': 7, 'side': 'B'}]
Explanation: A single record is already in canonical order.
Solution
def solution(trades):
return sorted(trades, key=lambda t: (t['ts'], t['symbol'], t['trade_id']))
Time complexity: O(n log n). Space complexity: O(n).
Hints
- Python's sorting supports tuple keys, which is useful for multi-level ordering.
- Only `ts`, `symbol`, and `trade_id` affect the order; the other fields should be carried along unchanged.
Part 2: Handle non-unique trade_id values (duplicates/replays)
Constraints
- 0 <= len(trades) <= 200000
- `qty` > 0 for every trade
- `side` is either `"B"` or `"S"`
- Duplicate `trade_id` values should be treated as replays
- If multiple records share a `trade_id`, keep the first one from the input and ignore the rest
Examples
Input: [{'trade_id': 'T2', 'symbol': 'AAPL', 'ts': 1001, 'price': 151, 'qty': 3, 'side': 'S'}, {'trade_id': 'T1', 'symbol': 'AAPL', 'ts': 1000, 'price': 150, 'qty': 10, 'side': 'B'}, {'trade_id': 'T2', 'symbol': 'AAPL', 'ts': 1001, 'price': 151, 'qty': 3, 'side': 'S'}, {'trade_id': 'T3', 'symbol': 'MSFT', 'ts': 1002, 'price': 330, 'qty': 5, 'side': 'S'}]
Expected Output: [{'trade_id': 'T1', 'symbol': 'AAPL', 'ts': 1000, 'price': 150, 'qty': 10, 'side': 'B'}, {'trade_id': 'T2', 'symbol': 'AAPL', 'ts': 1001, 'price': 151, 'qty': 3, 'side': 'S'}, {'trade_id': 'T3', 'symbol': 'MSFT', 'ts': 1002, 'price': 330, 'qty': 5, 'side': 'S'}]
Explanation: The second `T2` is a replay, so it is ignored. Then the remaining trades are sorted canonically.
Input: [{'trade_id': 'T1', 'symbol': 'MSFT', 'ts': 1005, 'price': 330, 'qty': 4, 'side': 'B'}, {'trade_id': 'T2', 'symbol': 'AAPL', 'ts': 1000, 'price': 150, 'qty': 1, 'side': 'S'}, {'trade_id': 'T1', 'symbol': 'AAPL', 'ts': 999, 'price': 100, 'qty': 9, 'side': 'S'}, {'trade_id': 'T3', 'symbol': 'GOOG', 'ts': 1001, 'price': 2800, 'qty': 2, 'side': 'B'}]
Expected Output: [{'trade_id': 'T2', 'symbol': 'AAPL', 'ts': 1000, 'price': 150, 'qty': 1, 'side': 'S'}, {'trade_id': 'T3', 'symbol': 'GOOG', 'ts': 1001, 'price': 2800, 'qty': 2, 'side': 'B'}, {'trade_id': 'T1', 'symbol': 'MSFT', 'ts': 1005, 'price': 330, 'qty': 4, 'side': 'B'}]
Explanation: Even though the later `T1` has different contents, it is still treated as a replay. Keep the first `T1` only.
Input: [{'trade_id': 'R1', 'symbol': 'IBM', 'ts': 7, 'price': 10, 'qty': 1, 'side': 'B'}, {'trade_id': 'R1', 'symbol': 'IBM', 'ts': 7, 'price': 10, 'qty': 1, 'side': 'B'}, {'trade_id': 'R1', 'symbol': 'IBM', 'ts': 7, 'price': 10, 'qty': 1, 'side': 'B'}]
Expected Output: [{'trade_id': 'R1', 'symbol': 'IBM', 'ts': 7, 'price': 10, 'qty': 1, 'side': 'B'}]
Explanation: All later occurrences are duplicates of the first one.
Input: []
Expected Output: []
Explanation: An empty input should return an empty list.
Solution
def solution(trades):
seen = set()
unique_trades = []
for trade in trades:
trade_id = trade['trade_id']
if trade_id not in seen:
seen.add(trade_id)
unique_trades.append(trade)
return sorted(unique_trades, key=lambda t: (t['ts'], t['symbol'], t['trade_id']))
Time complexity: O(n log n). Space complexity: O(n).
Hints
- Scan the input once and remember which `trade_id` values you have already seen.
- After filtering replays, you can reuse the same sorting rule as in Part 1.