This question evaluates competency in designing scalable, highly available payment-enabled distributed systems, including skills in traffic spike handling, idempotency and reconciliation, real-time totals and leaderboards, latency and throughput planning, and compliance/privacy considerations.
Design an online donation platform optimized for 3-day campaigns. Specify functional requirements (donor signup, campaign setup, real-time totals, receipts, refunds), non-functional goals (availability SLOs, latency budgets, throughput), and traffic estimates. Propose APIs and data models (Campaign, Donation, Donor, PaymentIntent, Receipt). Detail payment processing with idempotency, retry and reconciliation flows; fraud and abuse mitigation; rate limiting; and privacy/compliance considerations (PCI, PII, CCPA/GDPR). Describe architecture (services, databases, caches, queues), scaling and sharding strategy, consistency model for counters, and how you’d handle launch/closing traffic spikes. Explain observability (metrics, logs, tracing), disaster recovery, backfills, and how to extend to recurring donations or corporate matching.
Quick Answer: This question evaluates competency in designing scalable, highly available payment-enabled distributed systems, including skills in traffic spike handling, idempotency and reconciliation, real-time totals and leaderboards, latency and throughput planning, and compliance/privacy considerations.
Design: Online Donation Platform for 3-Day Campaigns
Context
Design an online donation platform optimized for short, 3-day fundraising campaigns. Each campaign opens and closes on a fixed schedule and can experience large traffic spikes at launch and at close (driven by marketing blasts and a final push). The platform must process payments reliably, surface near-real-time campaign totals to donors, and comply with privacy and payments regulations.
Treat this as an open-ended system-design discussion: state your assumptions, do a quick sizing pass, then go deep on the parts you consider load-bearing. You may assume card payments are handled by a third-party payment service provider (PSP).
Constraints & Assumptions
Use these as anchors (refine any you think are wrong, but justify the change):
Arrival pattern:
spiky — roughly 40% of donations land in the first 6 hours and 40% in the last 6 hours. Start/end times are
known in advance
.
Throughput targets:
donation-create bursts of 3-8k QPS during spikes; reads (landing pages, totals, leaderboards) at ~10-20x the write rate; webhook ingestion bursts of 1-2k QPS.
Latency budgets (at the API gateway):
non-payment endpoints P50 ≤ 200 ms, P95 ≤ 500 ms, P99 ≤ 1 s. Payment confirmation may take 3-5 s (PSP-dependent), so the confirmation UX must be async-friendly.
Integrity:
no lost accepted donations and no double charges; the financial ledger must be reconcilable to the cent via idempotency and reconciliation.
What a Strong Answer Covers
Signals the interviewer is listening for (these are dimensions to hit, not the answers themselves):
Requirements discipline
— separates functional from non-functional, and explicitly distinguishes
the UI total (can be eventually consistent)
from
the money (must be strongly consistent)
.
Sizing that interprets, not parrots
— reconciles the headline burst QPS with the implied average rate and says which is steady-state vs. tail.
A clean data model
— an exact, rounding-safe representation of money, financial records you can trust over time, uniqueness guarantees where they matter, and a clear source of truth for finance.
A correct payment path
— a flow that stays correct under retries and partial failures, with a credible answer for how double charges are prevented and accepted donations are never lost.
Real-time totals design
— a fast display layer decoupled from an exact finance layer, with drift correction.
Spike handling
— admission control / queueing, backpressure, cache priming, and scheduled pre-scaling.
Cross-cutting concerns
— fraud/abuse, rate limiting, PCI/PII/GDPR-CCPA, observability, DR, and sensible extensibility — covered at the right depth without burying the core.
Part 1 — Requirements, scope, and sizing
Enumerate the functional and non-functional requirements you'll commit to, then do a back-of-the-envelope sizing pass. Decide where the real difficulty of this system lies and say so out loud.
Functional surface to account for: donor signup/login (guest checkout allowed), admin campaign creation/scheduling (start/end, goal, currency, geos), public landing pages with progress bars and leaderboards, one-time multi-currency donations, near-real-time totals, receipts (email/SMS), refunds (full/partial) and chargebacks, plus admin dashboards/exports and finance/BI webhooks.
Part 2 — Data model and APIs
Propose the core entities (Campaign, Donor, Donation, PaymentIntent, Receipt, and any others you need) with their key fields, and a small set of REST (or gRPC) endpoints. Call out where idempotency keys live and how multi-currency is represented.
Part 3 — Payment processing: correctness under retries (the hard part)
This is the heart of the design. Specify the payment flow end to end and explain how the system never double-charges a donor and never loses an accepted donation, given that retries are guaranteed during spikes (timeouts, autoscaler restarts, users double-clicking). Cover idempotency, the confirm/retry path, webhook handling, refunds, and reconciliation.
Part 4 — Real-time totals and leaderboards
Design how donors see near-real-time campaign totals and a top-donors leaderboard while finance still gets exact numbers. Decide what the public progress bar shows when a refund happens.
Part 5 — Architecture, spikes, and scaling
Lay out the service decomposition, data plane (DB, cache, event bus, object storage), and how you handle the launch/close traffic spikes and scale over time. Because campaign start/end times are known, exploit that.
Observability:
the metrics, logs, traces, and alerts that matter for a money system.
Disaster recovery:
multi-AZ/region posture, RPO/RTO, and safe reprocessing on failover.
Extensibility:
how the design extends to recurring donations and corporate matching.
Follow-up Questions
Be ready for the interviewer to push further:
What breaks
first
if a campaign goes 100x more viral than expected — the PSP, the primary DB write path, the Redis counter, or the webhook queue — and how does your design absorb it?
If the PSP has a partial outage mid-spike (elevated latency and a rising error rate), what is the
exact
degraded behavior a donor experiences, and how do you keep the ledger correct through it?
Your Redis layer is wiped during a live campaign. Walk through recovering the live totals and the leaderboard with zero financial error.
How does the design change if you are
not
the merchant of record but instead facilitate funds to each campaign's own PSP account (a marketplace/Connect model)?