API Integration And External Service Design
Asked of: Software Engineer
Last updated
What's being tested
Stripe-style integration design tests whether you can build correct internal state while depending on unreliable external APIs. The interviewer is probing for choices around consistency, idempotency, failure isolation, schema boundaries, and API contracts, not just whether you can draw boxes. For a Software Engineer, the core skill is separating what must be strongly correct inside your system, such as a ledger mutation, from what can be retried, cached, degraded, or reconciled when an external service fails. Stripe cares because payment systems routinely integrate with banks, networks, risk providers, tax engines, maps, schedulers, and notification vendors where duplicate calls, partial failures, and stale data can directly affect money movement or user trust.
Core knowledge
-
Idempotency keys are mandatory for any operation that may be retried after timeout, connection reset, client crash, or
5xx. Store{idempotency_key, request_hash, response, status}with a uniqueness constraint; if the same key arrives with a different payload, return409 Conflict. -
Strong consistency should be scoped narrowly to the system of record. For financial flows, keep double-entry ledger writes in
`Postgres`or another transactional store usingSERIALIZABLEisolation, row-level locks, or append-only journal entries; do not couple correctness to an external routing or mapping API. -
Double-entry accounting models every movement as balanced debits and credits: . Per transaction, enforce this invariant at write time. Append-only entries plus derived balances are safer than mutable balance rows alone, because they support audit, replay, and reconciliation.
-
External service calls should usually sit outside database transactions. Holding a transaction open while calling
BikeMapor a notification provider risks lock contention, transaction timeouts, and ambiguous commits. Prefer reserve/commit state machines, outbox pattern, or asynchronous workers. -
Outbox pattern means writing the internal state change and an outbound task in the same local transaction, then having a worker deliver the external call. This avoids the “database committed but API call never happened” gap without requiring distributed transactions across systems.
-
Retry policy needs bounded exponential backoff with jitter; for example,
delay = min(base * 2^attempt, max_delay) + random_jitter. Retry429,500,502,503, and network timeouts; do not blindly retry semantic errors like400,401, or validation failures. -
Timeouts and circuit breakers protect your service from dependency collapse. Set client timeouts below your own
`p99`latency budget, cap concurrent calls with bulkheads, and open a circuit after sustained failures so requests degrade quickly instead of exhausting threads or connection pools. -
API contract design should distinguish canonical internal models from provider-specific DTOs. Keep a translation layer around
BikeMap,Twilio,SendGrid, or calendar APIs so provider fields, error codes, pagination, and version changes do not leak throughout your domain model. -
Consistency models must be explicit. A user-facing route estimate can be eventually consistent or cached with a TTL, while a ledger entry must be linearizable or at least serializable within an account. Stronger consistency increases latency and coordination cost, so apply it only where invariants demand it.
-
Calendar and schedule integration requires precise time semantics. Store instants in
UTC, preserve the user’s IANA timezone likeAmerica/Los_Angeles, and expand recurring rules usingRFC 5545semantics. Daylight saving transitions create nonexistent or duplicated local times that must be defined. -
Conflict resolution should be deterministic and explainable. For notifications, conflicts may be handled by priority, quiet hours, deduplication window, or “latest user preference wins.” For money movement, avoid “last write wins”; use optimistic concurrency with version checks or append-only commands.
-
Capacity planning should be back-of-the-envelope and tied to bottlenecks. If users each have schedules and generate reminders/day, notification volume is approximately QPS on average, with spikes around local morning hours requiring queue smoothing.
Worked example
For Design ledger and bikemap integration, a strong candidate first clarifies the domain boundary: “Is the ledger authoritative for payments or balances, and is BikeMap only used to compute route pricing, ETA, or validation?” They should also ask whether route data can be stale, whether prices are quoted before ledger commit, and what happens if the map provider is unavailable. A clean answer can be organized around four pillars: the ledger data model, the external integration boundary, the transaction/state machine, and failure handling.
The ledger design should be append-only, double-entry, and backed by a transactional store such as `Postgres`, with unique idempotency keys per client request. The BikeMap call should be wrapped behind an internal RouteService adapter that normalizes provider responses, applies timeouts, caches route estimates when allowed, and records provider request IDs for debugging. The key tradeoff to flag is whether to call BikeMap synchronously before ledger commit or asynchronously after: synchronous gives immediate pricing validation but adds latency and dependency risk, while asynchronous improves resilience but requires quote reservation or later adjustment. A robust design often computes or validates the route before creating a pending ledger transaction, then commits only the balanced ledger entries under a local transaction. If the provider times out after the client retries, idempotency ensures the same request does not create duplicate ledger movements. To close, say that with more time you would add reconciliation jobs, audit tooling, provider failover strategy, and integration tests with fault injection for timeout, duplicate response, and stale route scenarios.
A second angle
For Generate user notifications from schedules, the same integration principles apply, but the correctness boundary shifts from financial invariants to time, preference, and delivery semantics. The internal scheduler should decide what notification is due based on user profile, timezone, locale, recurrence rules, and quiet hours before calling an external push, email, or SMS provider. Unlike a ledger, duplicate delivery may be tolerable only within strict deduplication windows, so idempotency keys should be based on {user_id, schedule_occurrence_id, channel}. The external provider call belongs behind an adapter with retries, rate-limit handling, and delivery status mapping. The tricky design questions are around daylight saving time, recurring event expansion, user preference changes between scheduling and send time, and whether the system promises “exactly once” delivery or more realistically “at least once with deduplication.”
Common pitfalls
Pitfall: Treating external APIs as if they are reliable local function calls.
A tempting answer is “call BikeMap, then write the ledger, then return success,” without discussing timeouts, retries, or ambiguous outcomes. A stronger answer names the failure matrix: provider call succeeds but response is lost, database commit succeeds but worker crashes, client retries after timeout, and provider returns inconsistent errors.
Pitfall: Overusing distributed transactions.
Candidates sometimes propose two-phase commit across your database and the external service. That is usually impractical because most third-party APIs do not participate in 2PC, and even when possible it harms availability. Prefer local transactions plus idempotent APIs, outbox, sagas, reconciliation, and explicit pending/committed/failed states.
Pitfall: Ignoring domain-specific edge cases.
For notifications, saying “store the timestamp and send at that time” misses timezone, locale, recurrence, DST, and user preference updates. For ledgers, saying “update the balance” misses append-only auditability, balanced entries, concurrency control, and duplicate request protection. Interviewers reward candidates who surface these edge cases before being prompted.
Connections
Interviewers may pivot from this topic into distributed transactions, event-driven architecture, rate limiting, webhook design, or database isolation levels. For Stripe specifically, also be ready to discuss idempotent API design, ledger reconciliation, API versioning, and observability through structured logs, correlation IDs, metrics, and traces.
Further reading
-
Stripe API Idempotent Requests — practical reference for request replay safety and idempotency-key behavior.
-
Designing Data-Intensive Applications — strong foundation for consistency, transactions, replication, and fault tolerance tradeoffs.
-
RFC 5545: Internet Calendaring and Scheduling Core Object Specification — canonical reference for recurrence rules and calendar edge cases.
Featured in interview prep guides
Practice questions
Related concepts
- API Design, Data Modeling, and IndexingSystem Design
- Resilient API Aggregation And Operational DebuggingSoftware Engineering Fundamentals
- Scalable Backend Architecture And Data ModelingSystem Design
- Object-Oriented Design, API Design, And TestabilityCoding & Algorithms
- RESTful API And HTTP Service DesignSoftware Engineering Fundamentals
- Scalable Service And Distributed System DesignSystem Design