Design scheduled payments and cancellation
Company: Coinbase
Role: Software Engineer
Category: System Design
Difficulty: hard
Interview Round: Take-home Project
##### Question
Design a **scheduled payments service** for Coinbase that lets users create, list, update, execute, and cancel future-dated payments — both one-time and recurring. Because this is a crypto exchange, a "payment" is a transfer of funds between accounts (or to an external crypto address / fiat off-ramp) recorded in a ledger, not a card charge. The system must execute payments at the correct moment across time zones and DST, guarantee correctness (idempotency, exactly-once ledger effects), and stay durable through retries, partial failures, race conditions, and downstream outages.
Address the following:
1. **APIs.** Specify REST and gRPC endpoints for create / get / list / update / cancel, including idempotency keys, authentication/authorization, and request validation.
2. **Data model & storage schema.** Tables for schedules, payment instances/attempts, ledger references, idempotency records, and audit logs; choose appropriate indexes and a partitioning strategy.
3. **Scheduler architecture.** Propose a durable job-orchestration design (e.g., DB time-indexed buckets, transactional outbox, distributed delayed queue / Redis ZSET) that finds due payments and dispatches them reliably.
4. **Recurrence, time zones & DST.** Support one-time and recurring schedules (e.g., RFC 5545 RRULEs or presets). Store IANA time zones, compute next-run in UTC, and define behavior for non-existent (spring-forward) and ambiguous (fall-back) local times, plus end-of-month rules.
5. **Idempotency & exactly-once semantics.** Guarantee a client retry, a duplicate scheduler pickup, an at-least-once queue redelivery, and a timed-out processor call can never produce a double charge or double ledger entry.
6. **Concurrency & race conditions.** Define the state machine and concurrency control (row locks / `SELECT ... FOR UPDATE SKIP LOCKED`, leases, optimistic versioning). In particular, resolve the race between **cancellation and execution**.
7. **Retries, backoff & failure classification.** Distinguish transient vs. permanent errors, apply exponential backoff with jitter, and handle poison messages with a dead-letter queue.
8. **Outages & backfill.** Handle third-party processor / downstream outages (circuit breaking, optional fallback), and backfill missed jobs after the service itself is down, without a thundering herd.
9. **Notifications & webhooks.** Notify users and emit signed, retried, idempotent webhooks for lifecycle events.
10. **Scaling, partitioning, monitoring & alerting.** Cover horizontal scaling, hot-partition smoothing, key metrics/SLOs (e.g., due→executed latency), and alerts.
11. **Security & compliance.** Authz model, secrets/KMS, audit immutability, and compliance considerations appropriate to a financial/crypto platform.
Quick Answer: A Coinbase software-engineering take-home: design a scheduled payments service that creates, lists, updates, executes, and cancels future-dated one-time and recurring transfers with exactly-once ledger correctness. It tests distributed-systems architecture, idempotent REST/gRPC API design, data modeling, concurrency control, DST handling, retries/backoff, outage backfill, and observability. The model answer pins correctness on a single-writer state machine plus an idempotent ledger.