Coinbase Software Engineer Interview Prep Guide
Everything Coinbase actually asks Software Engineer candidates — concept walkthroughs, worked examples, and the real interview questions, drawn from candidate reports. Free to read.
Last updated

Technical Screen
Coding & Algorithms
-
In-Memory Stateful API Design — covered in depth under Take-home Project below.
-
Cursor-Based Pagination — covered in depth under Onsite below.
-
TTL, Expiration, And Snapshot Semantics — covered in depth under Take-home Project below.
System Design
-
Finite-State Machines For Order Lifecycles — covered in depth under Onsite below.
-
Idempotent API Design — covered in depth under Take-home Project below.
-
Banking Ledgers And Cashback Operations — covered in depth under Take-home Project below.
-
Crypto Trading And Order Routing Systems — covered in depth under Onsite below.
Behavioral & Leadership
- Technical Leadership, Communication, And Mission Fit — covered in depth under Onsite below.
Onsite
Coding & Algorithms
- In-Memory Stateful API Design — covered in depth under Take-home Project below.

What's being tested
Cursor-based pagination tests whether you can return stable slices from an ordered result set without relying on fragile OFFSET semantics. Interviewers probe deterministic ordering, cursor/token design, bidirectional navigation, filtering, duplicate avoidance, and complexity tradeoffs for both in-memory and database-backed queries.
Patterns & templates
-
Stable sort key: order by a business key plus unique tie-breaker, e.g.
ORDER BY created_at DESC, id DESC, to avoid skipped or duplicated rows. -
Forward cursor predicate: for descending order, fetch after
(ts, id)using(created_at, id) < (:ts, :id); invert comparisons for ascending order. -
Backward pagination: reverse the inequality, fetch
limit + 1, then reverse results before returning so UI order stays consistent. -
Opaque cursor token: encode
{created_at, id, filters, direction}withbase64or signed JSON; never expose mutable array indexes as durable cursors. -
Limit-plus-one: request
limit + 1rows to computehas_next_pageorhas_previous_pagewithout an extra count query. -
Filter composition: apply conjunctive filters before pagination; index should match
WHEREfilters thenORDER BY, e.g.(user_id, status, created_at, id). -
Complexity target: database seek pagination should be
O(limit)after index seek; in-memory versions are oftenO(n log n)sort plusO(limit)slice.
Common pitfalls
Pitfall: Using
OFFSETandLIMITas the primary solution; inserts or deletes between requests can shift rows and cause duplicates or missing items.
Pitfall: Sorting only by
created_at; equal timestamps make ordering nondeterministic unless you add a unique tie-breaker likeid.
Pitfall: Treating backward pagination as “same query with previous cursor”; you must flip comparison direction, fetch extra, and restore display order.
Practice these
The practice cards below cover the canonical variants — solve all of them and time yourself.
Practice questions
-
Top-K Queries And Streaming Aggregation — covered in depth under Take-home Project below.
-
TTL, Expiration, And Snapshot Semantics — covered in depth under Take-home Project below.
System Design

What's being tested
Coinbase is probing whether you can model an order lifecycle as a precise, enforceable finite-state machine rather than a loose set of status strings. For a trading system, incorrect transitions can mean duplicate fills, stuck funds, misleading balances, or orders that appear canceled while still executable. The interviewer is also testing distributed-systems judgment: how you handle idempotency, retries, out-of-order exchange callbacks, reconciliation, and API semantics when state changes cross service and third-party boundaries. A strong Software Engineer answer makes the state model explicit, then shows how storage, messaging, APIs, and recovery logic preserve it under failure.
Core knowledge
-
Finite-state machines define legal states, events, guards, and side effects: . For orders, avoid “anything can update status” logic; centralize transition validation in one domain layer or service.
-
A typical order state model includes
CREATED,VALIDATED,SUBMITTED,ACKED,PARTIALLY_FILLED,FILLED,CANCEL_REQUESTED,CANCELED,REJECTED,EXPIRED, and sometimesUNKNOWN. Terminal states such asFILLED,CANCELED, andREJECTEDshould reject further mutating transitions except safe reconciliation annotations. -
Transition guards encode business invariants: a buy order must have reserved quote currency before submission; a sell order must reserve base currency; cumulative filled quantity cannot exceed original quantity; and
remaining_qty = order_qty - filled_qtymust never go negative. -
Idempotency keys are mandatory for APIs like
POST /orders,POST /orders/{id}/cancel, and exchange submission. Storeidempotency_key, request hash, response, and order id so client retries return the same result instead of creating duplicate orders, following the commonStripe-style pattern. -
Optimistic concurrency control prevents lost updates. Store
versionorupdated_aton the order row and update withWHERE order_id = ? AND version = ?. If two events race, only one wins; the loser reloads state and re-applies transition validation. -
Event sourcing can fit order lifecycles well: persist immutable
OrderCreated,OrderSubmitted,FillReceived,CancelRequested,OrderCanceledevents, then derive current state. It improves auditability, but adds complexity around snapshots, replay, schema evolution, and exactly-once illusions. -
Relational storage such as
Postgresis often the source of truth for orders because you need transactions, constraints, indexes, and auditability. Use indexes like(user_id, created_at DESC),(status, updated_at), and unique constraints onclient_order_idoridempotency_key. -
Asynchronous integration is unavoidable with third-party exchanges. Your internal state may be
CANCEL_REQUESTEDwhile the exchange later reportsFILLED; the FSM must allow this if the fill happened before cancellation took effect. “Cancel requested” is not the same as “canceled.” -
Out-of-order events should be handled with exchange sequence numbers, timestamps, monotonic versions, or reconciliation fetches. Never blindly apply an older
ACKEDevent after a newerFILLEDevent; either discard stale events or route them through deterministic transition logic. -
Reconciliation repairs gaps between internal state and exchange state. Periodically compare open orders, fills, and balances from the exchange API against local records; for mismatches, emit corrective events rather than directly patching rows, preserving audit history.
-
In-memory indexes are acceptable for coding-style variants or small control-plane services:
order_id -> Order,user_id -> Set[order_id], andstate -> Set[order_id]giveO(1)lookup and transition updates. For millions of orders or durability requirements, move state to persistent storage and rebuild caches from logs. -
Observability should track stuck states and invalid transitions:
orders_in_cancel_requested_age_p99,transition_failure_count,reconciliation_mismatch_count,duplicate_event_count, andorder_submission_latency_p99. Alerts should target user-impacting states, not just process health.
Worked example
For Design cryptocurrency trading with third-party exchanges, start by clarifying whether Coinbase is the exchange, a broker routing orders to external venues, or both; whether orders are market and limit only; and what guarantees clients expect from POST /order and POST /cancel. State an assumption like: “I’ll design an internal order service that routes to multiple exchanges, with our database as the source of truth and external exchanges treated as eventually consistent.” Organize the answer around four pillars: the order FSM, the API contract, asynchronous exchange adapters, and reconciliation.
In the FSM pillar, define states such as CREATED, RESERVED, SUBMITTED, ACKED, PARTIALLY_FILLED, FILLED, CANCEL_REQUESTED, CANCELED, and REJECTED, then call out illegal transitions like FILLED -> CANCELED. In the API pillar, explain client_order_id and idempotency keys so retries do not double-submit. In the adapter pillar, describe per-exchange connectors that normalize exchange-specific statuses into internal events, with rate limiting and retry policies around external APIs. In the reconciliation pillar, explain that exchange callbacks can be missed or arrive out of order, so periodic polling compares external open orders and fills to internal state.
A concrete tradeoff to flag: using an event log plus derived order table gives excellent auditability and replay, but a simpler transactional orders table with an order_events audit table may be enough for a first version. Close by saying that, with more time, you would discuss multi-region failover, ledger integration for balance reservations, and operational dashboards for stuck orders.
A second angle
For Design order stream with state transitions, the same FSM idea is tested under tighter implementation constraints. Instead of a multi-service exchange-routing design, the focus is likely on in-memory data structures, transition validation, and efficient lookup by order_id and user_id. A strong answer would define an Order object, a transition map like Map<State, Set<State>>, and indexes such as ordersById and ordersByUser. The key difference is that durability, third-party retries, and reconciliation are secondary; correctness, complexity, and testability are primary. You should still mention that the same transition function could later be backed by Postgres or an event log if the problem moves from interview coding exercise to production system.
Common pitfalls
Pitfall: Treating order status as a free-form field instead of a state machine.
A weak answer says, “We’ll update the status to filled or canceled when events arrive,” without defining legal transitions or terminal states. A better answer names states, allowed transitions, guards, and what happens when an event arrives that is stale, duplicate, or invalid.
Pitfall: Assuming cancellation is synchronous and final.
In real trading systems, cancel usually means “request cancellation,” not “guarantee no more fills.” The safer design models CANCEL_REQUESTED separately from CANCELED and allows a final FILL event to win if the exchange executed the order before processing the cancel.
Pitfall: Jumping directly into
Kafka, sharding, or microservices before defining invariants.
Scaling details matter, but the interviewer first needs to hear the core correctness properties: no duplicate orders, no overfills, no illegal transitions, auditable state changes, and deterministic retry behavior. Once those are clear, infrastructure choices become easier to justify.
Connections
Interviewers often pivot from order FSMs into idempotent API design, event-driven architecture, distributed transactions versus sagas, and ledger/balance reservation systems. They may also ask how you would test the lifecycle with property-based tests, replay tests, or chaos scenarios involving duplicate and out-of-order exchange events.
Further reading
-
Designing Data-Intensive Applications — Martin Kleppmann’s chapters on transactions, replication, and stream processing are directly relevant to durable lifecycle systems.
-
Enterprise Integration Patterns — useful vocabulary for message routing, retries, idempotent receivers, and asynchronous system boundaries.
-
Stripe API Idempotency — concise real-world reference for designing retry-safe write APIs.
Practice questions
-
Idempotent API Design — covered in depth under Take-home Project below.
-
Concurrency Control — covered in depth under Take-home Project below.
-
Banking Ledgers And Cashback Operations — covered in depth under Take-home Project below.
What's being tested
These prompts test whether you can design a low-latency, reliable trading system that accepts user orders, routes them to one or more venues, tracks lifecycle state, and exposes real-time updates without losing or duplicating money-moving actions. Coinbase cares because trading infrastructure must handle bursty demand, third-party exchange failures, inconsistent market data, and strict correctness expectations while still meeting tight p99 latency targets. The interviewer is probing for practical distributed-systems judgment: idempotency, state-machine modeling, reconciliation, backpressure, rate limiting, and clear API boundaries between order intake, routing, execution, and streaming updates.
Core knowledge
-
Order lifecycle modeling is central. Represent orders as a finite state machine:
NEW -> ACCEPTED -> ROUTED -> PARTIALLY_FILLED -> FILLED, with terminal states likeCANCELED,REJECTED, andEXPIRED. Validate allowed transitions server-side so duplicate fills or late cancels cannot corrupt state. -
Idempotency keys prevent duplicate trades when clients retry after timeouts. A
POST /ordersrequest should includeclient_order_id; store(user_id, client_order_id)with a unique constraint inPostgresor another transactional store, and return the original result on repeat submission. -
Event sourcing works well for auditability. Append immutable events such as
OrderAccepted,OrderRouted,ExecutionReportReceived, andBalanceReserved; derive current state from events or maintain a materializedorderstable. This makes replay, debugging, and regulatory audit trails much easier. -
Balance reservation must be atomic with order acceptance. For a buy, reserve quote currency; for a sell, reserve base asset. Use a transactional ledger pattern:
available = total - reserved, and reject orders whenavailable < required_amount + fees. -
Third-party exchange integration is asynchronous and unreliable. External venues may return
HTTP 200before final execution, drop WebSocket messages, impose per-symbol rate limits, or send execution reports out of order. Treat venue APIs as eventually consistent and reconcile against their authoritative order endpoints. -
Reconciliation jobs repair missed events. Periodically compare internal state against exchange state using
GET /orders/{id}or venue batch endpoints. Reconciliation should be idempotent: if an internal order isROUTEDbut venue reportsFILLED, emit the missing fill event rather than mutating state blindly. -
Outbox pattern avoids dual-write bugs. If you write an order to
Postgresand publish toKafka, wrap the order insert andoutbox_eventsinsert in one transaction; a relay later publishes toKafka. This prevents “DB commit succeeded but publish failed” inconsistencies. -
Market data ingestion needs normalization and ordering. Different exchanges encode symbols, timestamps, sequence numbers, bids, asks, and trade ticks differently. Normalize to canonical fields like
venue,product_id,event_time,receive_time,sequence,bid_px,ask_px, and deduplicate using(venue, product_id, sequence). -
Streaming architecture usually separates hot-path ingestion from fanout. Use exchange WebSocket clients to ingest ticks, publish normalized events to
KafkaorNATS, maintain latest-book snapshots inRedisor memory, and push client updates over WebSocket/SSE. Avoid one exchange connection per user. -
Latency budgets should be explicit. If target order acknowledgment is
p99 < 200ms, budget roughly: API auth20ms, validation/reservation50ms, persistence30ms, enqueue/routing50ms, overhead50ms. For execution, distinguish user acknowledgment latency from venue fill latency. -
Rate limiting and backpressure protect both your system and external venues. Apply per-user limits at the API gateway, per-symbol limits in order routing, and venue-specific token buckets. When queues exceed thresholds, degrade predictably: reject new orders, pause low-priority market data, or serve stale-but-labeled prices.
-
Consistency choices depend on operation type. Order acceptance and balance reservation need strong consistency; market price display can tolerate eventual consistency and dropped intermediate ticks if the latest value is correct. A common rule: never be eventually consistent about user funds, but allow eventual consistency for read-only price feeds.
Worked example
For Design cryptocurrency trading with third-party exchanges, start by clarifying scope: “Are we only routing user market/limit orders, or also maintaining custody and balances? What latency and availability targets matter? How many venues and symbols? Do we need best-execution routing or just reliable routing?” Then declare assumptions: support BTC-USD-style spot trading, thousands of orders per second, external exchanges connected over REST/WebSocket, and correctness over ultra-low latency.
Organize the answer around four pillars: order intake, routing/execution, state management, and reconciliation/observability. In order intake, propose POST /orders, DELETE /orders/{id}, and GET /orders/{id}, all backed by idempotency keys and balance reservation. In routing, explain a venue adapter interface like placeOrder(), cancelOrder(), and getOrderStatus(), with per-venue rate limits and retry policies. In state management, describe an order state machine persisted through immutable events, with transitions triggered by user requests and exchange execution reports. In reconciliation, add a periodic scanner that compares internal orders against venue status and emits missing events to close gaps.
A strong tradeoff to flag is synchronous versus asynchronous order placement. Synchronously waiting for the exchange before acknowledging the user gives clearer semantics but couples your API latency to third-party failures; acknowledging after internal acceptance gives better availability but requires clear user-visible states like ACCEPTED versus ROUTED. Close by saying that, with more time, you would go deeper on ledger accounting, disaster recovery, and how to safely deploy venue adapter changes using shadow traffic and replay tests.
A second angle
For Design real-time stock price viewer, the same building blocks apply, but the correctness bar shifts from money movement to freshness, ordering, and fanout scalability. You would spend less time on balance reservation and order state machines, and more time on market data ingestion, deduplication, aggregation, and client delivery over WebSocket. The main state is not “order lifecycle” but “latest price/book per symbol,” often cached in memory or Redis for fast reads. The key tradeoff is whether to deliver every tick or coalesce updates, for example sending at most 10 updates per second per symbol to reduce client and network load. Reconciliation still matters, but it means resyncing snapshots after sequence gaps rather than repairing order fills.
Common pitfalls
Pitfall: Treating third-party exchange APIs as reliable local function calls.
A tempting but weak answer is “call the exchange, store the result, and return it to the user.” That ignores timeouts, duplicate submissions, partial fills, late execution reports, and exchange-side outages. A better answer explicitly separates internal acceptance from external execution and uses idempotency, retries with bounded backoff, and reconciliation.
Pitfall: Mixing all consistency requirements into one architecture.
Candidates often overbuild market data with strong transactional guarantees or underbuild order handling with eventually consistent writes. The sharper framing is to classify flows: funds and order state require strong consistency and auditability, while price streaming can favor low latency, deduplication, and eventual convergence.
Pitfall: Staying too high-level and never naming failure modes.
Saying “use microservices, Kafka, and Redis” is not enough. Interviewers want to hear how the design behaves when Kafka is delayed, a venue sends duplicate fills, a WebSocket drops sequence numbers, or a user retries POST /orders after a client timeout. Name the failure, then name the mechanism that contains it.
Connections
Interviewers can pivot from here into ledger design, matching engines, WebSocket fanout, distributed transactions, or event-driven architecture. They may also ask for deeper treatment of Kafka partitioning, exactly-once versus at-least-once delivery, API rate limiting, or how to debug a production incident where displayed balances diverge from executed fills.
Further reading
-
Designing Data-Intensive Applications — Martin Kleppmann’s book is the best practical foundation for replication, logs, consistency, and stream processing tradeoffs.
-
Stripe Idempotent Requests — Clear real-world pattern for safely retrying externally visible API requests.
-
Martin Fowler: Event Sourcing — Concise explanation of modeling state changes as immutable events, useful for order lifecycle and audit systems.
Practice questions
Behavioral & Leadership
What's being tested
Coinbase is probing for technical leadership: whether you can own ambiguous engineering work, make sound tradeoffs, communicate clearly, and deliver reliable systems with measurable impact. For a Software Engineer, this is not about vague “influence”; it is about how you debugged production risk, chose between architectures, coordinated dependencies, handled changing requirements, and improved customer or developer outcomes. Coinbase also cares about mission fit because engineers work on high-trust financial systems where correctness, security, compliance, and operational discipline matter as much as velocity. Strong answers show judgment under constraints: what you optimized for, what you deliberately did not do, and how you brought others along.
Core knowledge
-
STAR-L is the safest structure: Situation, Task, Action, Result, Learning. Keep the setup under 20% of the answer, spend most time on your technical actions, then close with quantified impact and what you would improve next time.
-
Ownership means you drove the outcome across code, design, rollout, monitoring, and follow-up. A strong SWE story includes artifacts like an RFC, design doc, migration plan,
p99latency dashboard, incident review, test strategy, or staged rollout plan. -
Impact measurement should use engineering metrics, not just adjectives. Examples: reduced
p99latency from 900ms to 250ms, cut deployment rollback rate from 8% to 2%, improved API availability from99.9%to99.99%, or removed 40% of manual on-call toil. -
Reliability tradeoffs are central in financial systems. If you mention availability, connect it to error budgets: monthly downtime budget minutes, so
99.9%allows about 43 minutes/month while99.99%allows about 4.3 minutes/month. -
Decision-making should compare at least two viable approaches. For example: synchronous validation versus asynchronous reconciliation, monolith change versus new service,
Postgrestransaction versus event-driven workflow, cache-first read path versus source-of-truth read path. -
Risk management is more impressive than heroics. Mention feature flags, canary deploys, dark launches, shadow traffic, idempotency keys, backfills with checkpoints, rollback plans, audit logs, and alerts on
error_rate,latency,saturation, and business-critical invariants. -
Communication should be audience-aware. For engineers, discuss schemas, APIs, failure modes, and migration sequencing; for managers, summarize scope, risk, timeline, and decisions needed; for customer-support or compliance partners, explain behavior changes and operational runbooks.
-
Cross-team collaboration often fails at interfaces. Strong answers name the boundary: API contract, event schema, authentication model, service ownership, SLA/SLO, data consistency expectation, or release dependency. Explain how you made assumptions explicit and prevented silent misalignment.
-
Change handling requires a re-planning mechanism, not just flexibility. Good examples include cutting scope to preserve safety, reordering milestones around a compliance deadline, freezing API contracts while iterating internals, or renegotiating delivery after discovering hidden technical debt.
-
UX quality is still in scope for frontend or full-stack engineers when tied to implementation details. Discuss accessibility, loading states, error states, optimistic updates, form validation, browser performance, Core Web Vitals like
LCPandCLS, and instrumentation for drop-offs or failed actions. -
Mission alignment at Coinbase should be concrete: building trusted crypto infrastructure, protecting customer assets, increasing economic freedom through reliable access, and respecting regulatory/compliance constraints. Avoid ideological monologues; connect mission to engineering behaviors like correctness, security, and operational excellence.
-
Professional transparency matters because financial services companies operate in high-trust environments. If asked about employment gaps, background checks, or a past performance issue, be factual, concise, non-defensive, and consistent with records; emphasize what changed and how you operate now.
Worked example
For “Discuss your proudest project,” frame the answer in the first 30 seconds by naming the project, your role, the technical stakes, and why it mattered: “I led the backend work to migrate our payments retry system from a cron-based batch process to an idempotent event-driven workflow; the goal was to reduce duplicate charges and improve recovery time.” Clarify scope without over-explaining: team size, duration, services touched, and whether you were tech lead, primary implementer, or cross-team coordinator.
Organize the answer around four pillars: the problem, the technical design, the leadership moments, and the measurable result. In the design pillar, compare the old and new architecture: batch retries made failures hard to isolate, while an event-driven flow with idempotency keys, durable state in Postgres, and alerts on reconciliation mismatches made retries safer. In the leadership pillar, describe how you wrote the RFC, got agreement from payments, infra, and support teams, broke the migration into phases, and created a rollback plan.
Flag one explicit tradeoff: you may have chosen a simpler Postgres-backed state machine over a new distributed workflow engine because the team needed auditability and operational familiarity more than maximum throughput. Then quantify the result: duplicate payment incidents fell by 70%, manual reconciliation time dropped from five hours/week to one hour/week, and p95 retry completion improved from 30 minutes to three minutes.
Close with learning: “If I had more time, I would have invested earlier in replay tooling and synthetic tests for provider-specific failures.” That ending shows maturity because you are proud without pretending the project was perfect.
A second angle
For “Describe cross-team collaboration on past projects,” the same leadership skills apply, but the interviewer is listening less for technical novelty and more for coordination mechanics. Start with the dependency graph: which teams owned which services, what contract connected them, and where the ambiguity was. A strong answer might describe aligning on an API contract, writing a migration checklist, setting weekly risk reviews, and defining who owned rollback decisions during launch.
The key constraint is that you cannot simply say, “I communicated frequently.” Show the artifacts: shared design doc, decision log, launch tracker, Slack incident channel, dashboards, and explicit sign-off criteria. The tradeoff might be slowing initial development to lock down interface contracts, which prevented rework when multiple teams integrated against the same endpoint.
Common pitfalls
Pitfall: Giving a “proudest project” answer that is really a feature demo.
A tempting answer is to describe what the product did and why users liked it, but that can make your role sound passive. Instead, emphasize the engineering challenge: constraints, alternatives considered, failure modes, your decisions, and the measurable outcome.
Pitfall: Claiming leadership by saying “I led meetings” or “coordinated stakeholders.”
That sounds managerial but not technical. A stronger answer ties coordination to technical artifacts: you resolved an API ownership dispute, clarified data consistency guarantees, sequenced a zero-downtime migration, or created a runbook that reduced on-call ambiguity.
Pitfall: Hiding or over-explaining sensitive background topics.
For questions about gaps, exits, or background checks, rambling creates risk and defensiveness. Give a direct factual answer, avoid blaming prior employers, state what documentation will show, and pivot to what you learned or how your current working habits address the concern.
Connections
Interviewers may pivot from this area into system design, especially if your story mentions migrations, distributed workflows, or reliability. They may also ask follow-ups on incident response, API design, testing strategy, frontend quality, or tradeoff analysis to verify that your behavioral story has real technical depth.
Further reading
-
Staff Engineer: Leadership beyond the management track — useful for understanding technical influence, scope, and operating through ambiguity without becoming a manager.
-
Google SRE Book — strong reference for reliability language: SLOs, error budgets, incident response, and operational discipline.
-
Stripe API Idempotency — practical example of a reliability pattern that often appears in high-trust financial engineering systems.
Practice questions
Take-home Project
Coding & Algorithms

What's being tested
These problems test stateful in-memory API design: choosing data structures, defining method semantics, and preserving correctness across updates, expiration, snapshots, and state transitions. Interviewers are probing whether you can turn ambiguous product-like operations into deterministic code with clear invariants, edge-case handling, and stated complexity.
Patterns & templates
-
Primary map + secondary indexes — store canonical objects by
id, then maintainuser_id, prefix, status, or priority indexes; update all indexes atomically. -
Lazy TTL expiration — implement
isExpired(now, item)and purge on read/write; cheaper than background cleanup, but scans must filter expired entries. -
Snapshot semantics —
backup(timestamp)should serialize only live records;restore(timestamp)usually restores the latest backup at or before target time. -
State machine validation — define allowed transitions like
OPEN -> FILLED/CANCELED; make invalid or repeated transitions explicit no-ops or errors. -
Idempotent operations — repeated
cancel(order_id)orassign(task_id)should not corrupt counters, balances, quotas, or indexes. -
Sorted retrieval — use
heapq,bisect,TreeMap-style structures, or sorted lists for priority/top-k/prefix scans; state complexity clearly. -
Command/API parsing layer — separate parsing from core methods like
create_account,transfer,set_with_ttl,list_tasks; this keeps tests targeted and readable.
Common pitfalls
Pitfall: Updating the primary dictionary but forgetting to update secondary indexes, causing stale results in
list_by_user, prefix scans, or status queries.
Pitfall: Treating TTL as deletion only at write time; reads, backups, restores, and scans must all respect expiration at the provided timestamp.
Pitfall: Hand-waving complexity with “it’s in memory”; interviewers expect
O(1)lookup,O(log n)indexed updates, orO(n)scans to be stated precisely.
Practice these
The practice cards below cover the canonical variants — solve all of them and time yourself.
Practice questions

What's being tested
This tests streaming aggregation and Top-K query design: maintaining counts, sums, balances, or revenues while processing events incrementally. Interviewers are probing whether you can choose between full sort, heap, ordered set, hash map aggregation, and time-window indexing while preserving correctness under ties, updates, and edge cases.
Patterns & templates
-
Hash map aggregation — use
dict[key] += valuefor counts, revenue, balances, or per-restaurant totals;O(n)build,O(m)space. -
Top-K via min-heap — maintain heap of size
k;O(n log k)time, better than fullO(n log n)sorting whenk << n. -
Top-K via sorting — aggregate first, then
sorted(items, key=(-metric, tie_breaker))[:k]; simple, deterministic, acceptable for moderatem. -
Ordered ranking with ties — define comparator explicitly: higher metric first, then lexicographic ID, timestamp, or insertion order; avoid nondeterministic output.
-
Time-window aggregation — filter by
start <= ts < end, or maintain deque/prefix sums for repeated range queries; watch inclusive/exclusive boundaries. -
Streaming updates — for changing scores, use lazy heap entries
(score, id, version)and discard stale records on pop; avoids expensive heap deletion. -
SQL Top-K template —
GROUP BY entity, computeSUM(...), thenORDER BY metric DESC, entity ASC LIMIT k; useROW_NUMBER()for per-group Top-K.
Common pitfalls
Pitfall: Computing Top-K directly on raw events instead of aggregating by entity first; this ranks orders, not restaurants/users/accounts.
Pitfall: Ignoring tie-breaking;
Coinbase-style coding prompts often expect deterministic output even when metrics are equal.
Pitfall: Using full recomputation for every query when repeated streaming queries require incremental maps, heaps, prefix sums, or window indexes.
Practice these
The practice cards below cover the canonical variants — solve all of them and time yourself.
Practice questions

What's being tested
This tests time-aware in-memory state management: storing records whose visibility depends on currentTime, ttl, and snapshot/restore rules. Interviewers are probing whether you can design clean data structures, implement deterministic expiration, and preserve correctness across scans, quotas, backups, restores, and versioned mutations.
Patterns & templates
-
Lazy expiration — check
expireAt <= nowinsideget,scan,list, andbackup; avoid eager cleanup unless required. -
Absolute expiry timestamps — store
expireAt = timestamp + ttl, not remaining TTL; useNoneorINFfor non-expiring records. -
Snapshot semantics —
backup(now)should persist only live records, often as remaining TTL:remaining = expireAt - now. -
Restore semantics — rebuild expiry relative to restore time:
newExpireAt = restoreTime + savedRemainingTtl; preserve non-expiring values unchanged. -
Nested maps — common shape is
db[key][field] = {value, expireAt}; operations are usuallyO(1)point lookup andO(k log k)sorted scans. -
Deterministic scans — prefix scans require filtering live fields first, then lexicographic sort; do not rely on hash-map iteration order.
-
Quota plus TTL accounting — storage usage must exclude expired files/tasks; call
purgeExpired(user, now)before capacity checks or priority lists.
Common pitfalls
Pitfall: Treating TTL as duration forever instead of converting to an absolute deadline causes incorrect behavior after multiple reads, backups, or restores.
Pitfall: Backing up expired records because they still exist in the hash map violates snapshot semantics; logical liveness matters more than physical presence.
Pitfall: Sorting before filtering can leak expired fields into prefix scans or produce wrong ordering when deleted/reinserted records share names.
Practice these
The practice cards below cover the canonical variants — solve all of them and time yourself.
Practice questions
System Design

What's being tested
Interviewers are probing whether you can design financial APIs that remain correct when clients retry, networks time out, workers crash, or requests race concurrently. At Coinbase, a Software Engineer must assume that duplicate requests will happen and that “charge twice” or “credit twice” is an unacceptable failure mode. Strong answers show how idempotency, transactional integrity, concurrency control, and auditability work together, especially around ledgers, account balances, transfers, account opening, and delayed rewards like cashback. The interviewer is not looking for a slogan like “use an idempotency key”; they want the storage model, request lifecycle, failure behavior, and tradeoffs.
Core knowledge
-
Idempotency means applying the same logical operation multiple times has the same durable effect as applying it once. For write APIs, this usually requires a client-supplied
Idempotency-Key, because two identical-looking requests may be different business intents. -
Idempotency keys should be scoped by caller and endpoint, such as
(user_id, endpoint, idempotency_key), not globally. A practical table useskey,request_hash,status,response_body,resource_id,created_at, andexpires_at, with a unique constraint on the scoped key. -
Request fingerprinting prevents accidental key reuse. Store a normalized
request_hash = SHA256(method + path + canonical_json_body); if the same key arrives with a different hash, return409 Conflictor422 Unprocessable Entityinstead of silently replaying a stale response. -
Atomic write boundaries matter more than the key itself. For a money movement API, insert the idempotency record, ledger entries, and resulting transfer state in one
Postgrestransaction. If the key is committed but the ledger write is not, retries can become stuck or incorrectly suppressed. -
Double-entry ledgering is the usual
Coinbase-grade model for balances. Every transfer creates balanced debits and credits where within a transaction. Available balance should be derived from immutable ledger entries or maintained as a transactional projection, not updated as an isolated mutable number. -
Uniqueness constraints are your first line of defense. Use
UNIQUE(account_id, idempotency_key)for API dedupe andUNIQUE(transfer_id, account_id, entry_type)for ledger entries. Application checks alone are unsafe under concurrent retries. -
Concurrency control must be explicit. Use row-level locks like
SELECT ... FOR UPDATE, optimistic version checks likeWHERE version = ?, or serializable transactions. For balance checks, the invariant is usuallyavailable_balance >= debit_amount, enforced inside the same transaction as the ledger write. -
Replay semantics should be deterministic. If the original request succeeded, return the same
201or200response body. If it failed validation before side effects, it can safely fail again. If it is still processing, return202 Accepted,409 Conflict, or a documented “in progress” response. -
TTL policies are a product and correctness tradeoff, but the engineering default for financial writes is longer retention. Many APIs keep idempotency records for 24 hours to 7 days; ledger systems may retain operation IDs indefinitely because storage is cheap relative to financial risk.
-
Asynchronous workflows need idempotent consumers too. If an API creates a
transfer_idand publishes toKafkaor a worker queue, every downstream side effect should dedupe on a stable business key liketransfer_id, not on delivery attempt. Assume at-least-once delivery. -
External payment rails complicate idempotency. Your internal API may be idempotent, but ACH, card, or bank partners may use their own reference IDs. Persist the external
rail_reference_idand reconcile by status transitions such asPENDING,SETTLED,FAILED, andREVERSED. -
Observability should distinguish safe retries from bugs. Track metrics like idempotency hit rate, key conflict rate, duplicate ledger-entry rejection count, transaction retry count, and write-path
p99latency. A spike in conflicts may indicate a client incorrectly reusing keys.
Worked example
For Design a bank account ledger, a strong candidate should start by clarifying the operations: deposits, withdrawals, internal transfers, reversals, and balance reads; whether balances must be strongly consistent; and whether external rails are involved. Then state a core assumption: all money movement must be represented as immutable double-entry ledger entries, and every write API must be idempotent because clients and workers will retry.
The answer can be organized around four pillars. First, define APIs such as POST /transfers requiring Idempotency-Key, with request hashing and replay behavior. Second, define storage: accounts, transfers, ledger_entries, and idempotency_keys in Postgres, with unique constraints around operation identity. Third, describe the transaction flow: validate request, reserve or lock the account row, insert or read the idempotency record, create balanced ledger entries, update transfer state, and commit atomically. Fourth, cover failure handling: if the client times out after commit, the retry returns the stored response; if the transaction aborts, no ledger entries exist and the retry can execute normally.
One tradeoff to flag explicitly is whether to compute balances by summing ledger entries or maintain a cached balance projection. Summing is maximally auditable but too slow for high-volume accounts; a projection is fast but must be updated in the same transaction and periodically reconciled against the immutable ledger. A good close would be: “If I had more time, I’d discuss reconciliation jobs, reversal entries instead of deletes, and how to test crash points between idempotency write, ledger write, and response persistence.”
A second angle
For Design account opening workflow, the same concept applies, but the side effects are not just money movement. The workflow may create a user profile, store PII, call KYC/AML vendors, open an internal account, and send notifications. Here, idempotency should be modeled around a stable application_id or client-supplied key so retries do not create duplicate accounts or duplicate vendor checks.
Unlike a ledger transfer, the workflow may be long-running and partially complete, so a state machine matters: STARTED, PII_COLLECTED, KYC_PENDING, APPROVED, ACCOUNT_OPENED, REJECTED. Each transition should be idempotent and guarded by valid previous states. The candidate should emphasize that a retry should resume or return the current application state, not restart the workflow from scratch.
Common pitfalls
Pitfall: Saying “make
POSTidempotent by usingPUT” without explaining operation identity.
HTTP method semantics are not enough for financial correctness. PUT /accounts/{id} can still race, and POST /transfers can be perfectly safe if it requires an idempotency key, stores the response, and enforces unique ledger operation IDs.
Pitfall: Treating
Redisas the only idempotency store.
A cache-only design can lose keys during eviction or failover, allowing duplicate money movement. Redis can help with short-lived locks or fast-path checks, but the durable source of truth for financial idempotency should usually live in the same transactional database as the side effect, such as Postgres.
Pitfall: Ignoring “same key, different payload.”
This is a common interviewer follow-up. The wrong answer is to return the original response no matter what; the better answer is to store a canonical request hash and reject mismatches, because key reuse may represent a client bug or a malicious attempt to mask a different operation.
Connections
Interviewers often pivot from idempotent API design into transaction isolation, distributed locks, outbox pattern, saga orchestration, or ledger reconciliation. They may also ask how you would test the design with concurrent requests, injected timeouts, process crashes, and duplicate message delivery.
Further reading
-
Stripe API Idempotent Requests — canonical practical pattern for idempotency keys, replayed responses, and parameter mismatch handling.
-
Designing Data-Intensive Applications — Martin Kleppmann — strong background on transactions, consistency, fault tolerance, and distributed system failure modes.
-
Outbox Pattern — Chris Richardson — useful for connecting database commits to reliable asynchronous processing without pretending distributed writes are atomic.
Practice questions

What's being tested
Interviewers are probing whether you can design financially correct backend systems under concurrent requests, retries, delayed execution, and partial failures. For Coinbase, this matters because account balances, transfers, scheduled payments, cashbacks, and ledger entries must preserve invariants even when many clients, workers, and services act at the same time. A strong Software Engineer answer should distinguish business correctness from database mechanics: locks, transactions, idempotency, and audit logs are tools for enforcing domain invariants like “money is neither created nor lost.” Expect the interviewer to test whether you can choose an appropriate consistency model, explain tradeoffs, and avoid hand-wavy “just use a transaction” answers.
Core knowledge
-
Concurrency control is about preserving invariants when operations overlap. In banking-style systems, core invariants include non-negative available balance, exactly one effect per client request, immutable audit history, and double-entry balance preservation: total debits must equal total credits.
-
ACID transactions are the default starting point for account mutations. Use a database like
`Postgres`withBEGIN, row updates, constraints, andCOMMITto make multi-step operations atomic. The key interview move is explaining what rows are protected and at what isolation level. -
Pessimistic locking prevents conflicts by locking records before mutation, commonly with
SELECT ... FOR UPDATEin`Postgres`. For a transfer, lock both account rows in deterministic order, such as ascendingaccount_id, to avoid deadlocks when two opposite-direction transfers run concurrently. -
Optimistic concurrency control allows concurrent reads and detects conflicts at write time using a
versioncolumn or compare-and-swap condition:UPDATE accounts SET balance = balance - 100, version = version + 1 WHERE id = ? AND version = ?. If zero rows update, retry with backoff. -
Isolation levels matter.
READ COMMITTEDcan be acceptable with row locks and conditional updates, but can allow anomalies in more complex predicates.SERIALIZABLEgives the simplest correctness story but may reduce throughput due to transaction aborts; design retries for serialization failures. -
Idempotency keys protect against client retries, timeouts, and duplicate job execution. Store
(client_id, idempotency_key) -> request_hash, status, responsewith a unique constraint. If the same key arrives with a different payload, return a conflict rather than executing a second mutation. -
Double-entry ledger design is stronger than updating balances directly. Represent each financial event as immutable ledger entries: debit one account, credit another, with a shared
transaction_id. Balances become derived state, cached projections, or materialized summaries. This improves auditability and reconciliation. -
Available balance vs ledger balance is an important distinction. A card authorization, pending withdrawal, or scheduled payment may place a hold before final settlement. Avoid treating every pending operation as final money movement; model states like
PENDING,POSTED,FAILED, andREVERSED. -
Atomic transfer usually requires checking funds and writing both sides in one transaction. The invariant is:
from.available_balance >= amountbefore debit, and after commit,from.balance -= amount,to.balance += amount, plus immutable audit entries. Never debit in one transaction and credit later without a compensating design. -
At-least-once execution is common for schedulers and queues. A scheduled payment worker may run twice after a crash or visibility timeout. Make the payment execution idempotent with a stable
payment_idorexecution_id, not by assuming the scheduler fires exactly once. -
Deadlocks and hot accounts are realistic edge cases. Deterministic lock ordering reduces deadlocks; short transactions reduce lock hold time. For very hot entities, consider single-writer partitioning, per-account command queues, or ledger append with asynchronous projection rather than repeatedly updating one balance row.
-
Auditability requires immutable records, not just logs. Application logs in
`Datadog`or`CloudWatch`are not the source of truth. Store durable transaction records with who, what, when, amount, currency, idempotency key, previous state, resulting state, and links to external rail references.
Worked example
For “Design a bank account ledger,” a strong candidate should first clarify scope: “Are we supporting internal transfers only, or external payment rails too? Do balances need to be strongly consistent on reads? What throughput and latency should I assume? Are multi-currency accounts in scope?” Then declare a practical assumption: use a relational database such as `Postgres` for the core ledger because correctness and transactional constraints matter more than extreme write scale at first.
The answer skeleton should have four pillars. First, define the data model: accounts, ledger_transactions, ledger_entries, and optional balance_snapshots or materialized account balances. Second, define the write path: validate request, enforce idempotency, open a transaction, append balanced debit/credit entries, update balance projection, commit. Third, define concurrency control: lock affected accounts in deterministic order or use conditional versioned updates; retry on deadlocks and serialization failures. Fourth, define operational correctness: immutable audit trail, reconciliation jobs, monitoring for invariant violations, and recovery from partial external failures.
A specific tradeoff to flag is whether balances are computed from the ledger on every read or stored as a projection. Computing from the ledger is simplest and maximally auditable but becomes expensive as entries grow; maintaining a current_balance projection is faster but must be updated transactionally with ledger entries. A strong close would be: “If I had more time, I’d go deeper on external settlement states, multi-currency rounding, and backfill/reconciliation strategy, but the core design keeps every balance-changing event atomic, idempotent, and auditable.”
A second angle
For “Design a scheduled payments service,” the same concurrency ideas appear through delayed execution and retries rather than immediate user transfers. The scheduler should durably store payment intents with run_at, state, and an idempotency key, then workers claim due rows using a safe pattern such as SELECT ... FOR UPDATE SKIP LOCKED or a queue with visibility timeouts. The critical point is that claiming a job is not the same as executing the payment; the actual debit/credit operation must still be idempotent and transactionally protected. Here the design emphasis shifts from row-level account contention to job orchestration, duplicate execution, and state transitions like SCHEDULED -> PROCESSING -> SUCCEEDED or FAILED.
Common pitfalls
Pitfall: Saying “use a distributed lock” as the main correctness mechanism.
A lock service like `Redis` or `ZooKeeper` may help coordinate workers, but it should not be the source of financial truth. A better answer grounds correctness in database transactions, unique constraints, immutable ledger rows, and idempotent mutation APIs.
Pitfall: Treating idempotency as equivalent to concurrency control.
Idempotency prevents the same logical request from being applied twice; it does not prevent two different valid requests from overdrawing the same account. You still need row locks, conditional updates, serializable transactions, or another conflict-resolution mechanism for concurrent independent operations.
Pitfall: Over-indexing on high-level architecture and skipping invariants.
A tempting answer is to draw services, queues, and caches without saying exactly how money movement remains balanced. Interviewers will respond better if you state invariants early, then map each component to an enforcement mechanism: unique constraint for duplicate requests, transaction for atomicity, ledger entries for auditability, and reconciliation for external discrepancies.
Connections
Interviewers may pivot from this topic into idempotent API design, double-entry accounting, distributed transactions, event-driven architecture, or database isolation levels. They may also ask how your design changes when introducing `Kafka`, external payment processors, cached balances, or cross-region availability requirements.
Further reading
- Designing Data-Intensive Applications — Martin Kleppmann’s chapters on transactions, isolation, replication, and distributed systems are directly applicable to financial backend design.
- Stripe API Idempotent Requests — practical reference for the idempotency-key pattern used in payment-style APIs.
- PostgreSQL Transaction Isolation — concrete behavior of
READ COMMITTED,REPEATABLE READ, andSERIALIZABLEin a real relational database.
Practice questions

What's being tested
This tests whether you can design a transactionally correct financial system where money movement is durable, auditable, idempotent, and safe under concurrency. Coinbase cares because balances, rewards, transfers, and settlement flows must remain correct even when APIs retry, workers crash, external rails disagree, or users issue simultaneous operations. The interviewer is probing for concrete backend engineering judgment: data modeling, database transactions, concurrency control, delayed operations like cashback, reconciliation, observability, and tradeoffs between strong consistency and scale.
Core knowledge
-
Double-entry ledger accounting is the canonical model for auditable money movement. Every transaction creates at least two immutable ledger entries: one debit and one credit. The invariant is per transaction, often enforced with database constraints and transactional writes.
-
Balances should usually be derived from an immutable ledger, not treated as the source of truth. A cached balance table is acceptable for reads if it is updated atomically with ledger entries or rebuilt from the ledger. The ledger answers “what happened”; the balance table answers “what is current.”
-
Idempotency keys are mandatory for APIs like
POST /transfersorPOST /cashback. Store a client-providedidempotency_keywith the request hash and final response. If the same key is retried, return the original result; if the payload differs, reject with a conflict. -
Database transactions are the simplest correctness boundary. In
Postgres, a transfer should insert ledger entries, update balance projections, and persist the transfer record in oneSERIALIZABLEor carefully designedREAD COMMITTEDtransaction with row locks. Never debit in one transaction and credit later. -
Concurrency control prevents double-spend. Common approaches are
SELECT ... FOR UPDATEon account balance rows, optimistic version checks likeUPDATE accounts SET balance = balance - ? WHERE id = ? AND balance >= ? AND version = ?, or serializable isolation. The key is making “check funds” and “deduct funds” atomic. -
Available balance and settled balance are different. A card authorization may place a hold, reducing available funds before settlement. Model states like
pending,posted,reversed, andfailed; avoid deleting rows when lifecycle transitions occur. -
External payment rails require reconciliation because your ledger and a bank/card network can diverge. Store external IDs, statuses, timestamps, and raw references. Periodic jobs compare internal expected state against external reports and create compensating entries rather than mutating history.
-
Cashback is a delayed, conditional credit. Record the triggering spend transaction, eligibility decision, reward amount, scheduled payout time, and status. The payout worker must be idempotent so a retry cannot issue duplicate rewards; use a unique constraint on
(source_transaction_id, reward_type). -
Delayed scheduling can be implemented with a durable table plus polling workers for modest scale, or with
Kafka,SQS, or a workflow engine likeTemporalfor higher scale and retries. The correctness source should be durable storage, not only an in-memory timer. -
Leaderboards for top spenders or transfer volume should not query the full ledger synchronously. Maintain an aggregation table keyed by account and time window, or use a stream consumer. For millions of accounts, use approximate or precomputed rankings; exact global rankings over raw entries become expensive.
-
Auditability means immutable append-only records, stable identifiers, actor metadata, timestamps, reason codes, and trace IDs. Corrections should be new entries, not updates to old amounts. This supports debugging, compliance review, and deterministic replay.
-
Operational safety includes alerts on ledger imbalance, negative balances, duplicate idempotency keys with different payloads, worker retry storms, reconciliation mismatches, and
p99transaction latency. A good design includes runbooks for stuck pending transfers and safe backfills.
Worked example
For “Design account system with cashback”, a strong candidate should first clarify scope: are accounts single-currency, is cashback immediate or delayed, can transfers fail, and do we need exact real-time leaderboard rankings? Reasonable assumptions might be: one currency, internal transfers only, cashback paid 24 hours after eligible outgoing transfer, and a leaderboard by total outgoing transfer amount.
The answer can be organized around four pillars. First, define APIs like createAccount, transfer, getBalance, getCashbackStatus, and getTopSpenders. Second, model the data: accounts, ledger_transactions, ledger_entries, cashback_rewards, and spend_aggregates. Third, explain the transfer transaction: lock debit account, verify available balance, insert balanced debit/credit ledger entries, update balance projections, create cashback schedule, and update spend aggregate atomically where possible. Fourth, cover async processing: a worker scans due cashback_rewards, inserts a cashback credit transaction idempotently, marks the reward paid, and retries safely after crashes.
A specific tradeoff to flag is whether cashback creation happens synchronously inside the transfer transaction or asynchronously from a transfer event. Synchronous creation is simpler and strongly consistent; asynchronous creation scales better but requires durable event processing and reconciliation to avoid missed rewards. For an interview, start with synchronous metadata creation plus async payout because it is easier to reason about and still supports delayed execution.
Close by saying that, with more time, you would add multi-currency support, reversal handling, fraud holds, external settlement reconciliation, and backfill tooling to recompute balances and leaderboards from the immutable ledger.
A second angle
For “Design a bank account ledger”, the emphasis shifts away from cashback scheduling and toward the ledger as the system of record. The strongest framing is to separate the journal transaction from individual ledger entries, enforce the zero-sum invariant, and make all corrections append-only. You would spend more time on transaction isolation, schema constraints, replayability, and reconciliation with external systems. The same principles still apply: idempotent writes, atomic debits and credits, immutable history, and derived balance projections. The difference is that rewards and rankings are optional consumers of the ledger, not the center of the design.
Common pitfalls
Pitfall: Treating
accounts.balanceas the only source of truth.
A tempting answer is to update two account rows directly: subtract from Alice, add to Bob, and call it done. That misses auditability, replay, correction, and reconciliation. A better answer uses immutable ledger entries as the source of truth and treats balances as projections optimized for reads.
Pitfall: Saying “use a queue” without explaining correctness.
A queue can help with cashback payouts or aggregation, but it does not automatically guarantee exactly-once money movement. Interviewers want to hear how duplicate messages are handled: idempotency keys, unique constraints, transaction records, status transitions, and retry-safe workers.
Pitfall: Over-indexing on scale before correctness.
Sharding, caching, and streaming are useful, but a financial system that scales incorrect balances is a failed design. Start with a single-region transactional database and clear invariants, then discuss how to partition by account, archive historical entries, or move read-heavy analytics to derived stores.
Connections
The interviewer may pivot into distributed transactions, event-driven architecture, database isolation levels, or reconciliation workflows. They may also ask for a coding-style in-memory version, where the same ideas become data structures, ordering rules, scheduled events, and deterministic edge-case handling.
Further reading
-
Designing Data-Intensive Applications — Excellent grounding in transactions, isolation, logs, replication, and derived data systems.
-
Stripe API Idempotent Requests — Practical reference for idempotency-key behavior in payment-style APIs.
-
Martin Fowler: Accounting Patterns — Useful conceptual model for entries, accounts, and immutable financial events.
Practice questions