Design a Usage-Based Credit Tracking System for a SaaS Platform
Company: Clay
Role: Software Engineer
Category: System Design
Difficulty: medium
Interview Round: Technical Screen
# Design a Usage-Based Credit Tracking System for a SaaS Platform
You are designing the **credit tracking system** for a B2B SaaS data-enrichment platform (think of a tool where users run "enrichments" — e.g., look up a person's work email, a company's headcount, or a social profile). Every account buys a plan that grants a monthly allotment of **credits**. Each operation a user runs consumes a known number of credits, and the platform must keep each account's balance accurate, deduct credits as work is performed, prevent accounts from spending credits they don't have, and give both users and finance trustworthy usage reporting.
Design the end-to-end system that maintains each account's credit balance, authorizes and records credit consumption for every operation, and supports the surrounding lifecycle: monthly plan refreshes, rollover, overage, and billing reconciliation.
### Constraints & Assumptions
State your own; these are the numbers this write-up assumes.
- ~100k active accounts. Target ~50M credit-consuming operations/day, peak ~5k ops/sec, **bursty** — a single batch enrichment job may submit tens of thousands of rows at once.
- An operation consumes a small integer number of credits (typically 1–50). The cost is usually known before execution, but some third-party-metered operations are only fully known **after** the call.
- Plans grant a monthly credit allotment (roughly 10k–1M credits) that refreshes on each account's **billing anniversary**. Some plans allow rollover up to a cap; some allow paid overage (pay-as-you-go), others hard-block at zero.
- Credits are money-equivalent: **never** let an account go materially negative, and **never** double-charge — operations can be retried at any layer.
- The pre-operation balance check must be fast (single-digit milliseconds) so it does not throttle enrichment throughput.
- Usage history must be queryable for user dashboards and monthly invoices, and retained for years for audit/disputes.
### Clarifying Questions to Ask
- Is a credit cost always known **before** the operation runs, or only measurable **after** (e.g., metered by rows returned or third-party cost)? That decides whether we charge once or authorize-then-capture.
- How strict is no-overspend: must overspend be **impossible**, or is small, bounded overspend acceptable if we reconcile afterward? This drives the whole consistency/latency trade-off.
- How are credits granted — a single monthly allotment, or multiple **buckets** (plan credits + purchased top-ups + promo credits) with their own priority and expiry?
- When an account hits zero, do we **hard-block** or allow **paid overage**?
- What are the latency/throughput SLOs for the authorization check, and how large/bursty are batch jobs?
- Do **refunds/reversals** happen (failed enrichment, provider returned no data, user dispute)?
### Part 1 — Credit model and the deduction path
Design the data model and the **synchronous path** that authorizes and records a single credit-consuming operation. Define how an account's balance is represented, how you **atomically** check-and-deduct credits when an operation runs, and how you guarantee a retried or duplicated operation is charged **at most once**.
```hint Where to start
Model the balance as a value **derived from an append-only ledger** of credit transactions, not a single mutable counter. The ledger is your source of truth, your audit log, and what reconciliation and dispute resolution read from.
```
```hint Atomicity
A check-and-deduct is a conditional write. Think `UPDATE accounts SET balance = balance - :cost WHERE account_id = :id AND balance >= :cost` (rows-affected = 1 means success, 0 means insufficient), or a ledger insert inside a transaction guarded by a non-negative-balance constraint.
```
```hint Exactly-once
Carry a client-supplied **idempotency key** per operation and put a unique constraint on it, so a retry collapses to a no-op that returns the original outcome instead of charging twice.
```
#### Clarifying Questions for this Part
- For after-the-fact costs, do we want a two-phase **authorize (hold) then capture (settle)**, and what's the max hold lifetime before it expires?
- On insufficient balance, should the API fail the whole batch or charge what it can and reject the remainder?
#### What This Part Should Cover
```premium-lock What This Part Should Cover
```
### Part 2 — Scale, concurrency, and hot accounts
The system must sustain ~5k authorizations/sec with bursts from large batch jobs, and the per-operation check must stay in the single-digit-millisecond range. Explain how you keep balances consistent under high concurrency, prevent a single **hot account row** from becoming a bottleneck when one account submits a 50k-row batch, and where you may cache **without** ever enabling overspend.
```hint Concurrency
A single mutable balance row per account serializes all of that account's writes — fine until one batch job hammers it. Consider **sharding the balance into N sub-buckets** (route + aggregate) or **reserving a block of credits per worker** so most deductions happen against a local lease.
```
```hint Caching boundary
Cache reads for **display**, but the authoritative deduct must hit a consistent store (or an atomic counter such as a Redis `DECRBY` with durable write-behind to the ledger). Be explicit about what may be slightly stale (dashboards) vs strictly consistent (the deduct).
```
```hint Bursts
For huge batch jobs, **reserve N credits up front, then settle the actual amount** at the end. This collapses per-row contention into one reserve + one settle and gives clean back-pressure when the reservation can't be met.
```
#### What This Part Should Cover
```premium-lock What This Part Should Cover
```
### Part 3 — Plan refresh, rollover, reporting, and billing reconciliation
Design the lifecycle and reporting around the ledger: **monthly credit grants** on each account's billing anniversary (with optional rollover caps and expiry), **multiple credit sources** (plan credits, purchased top-ups, promo credits) consumed in a defined priority, **user-facing usage dashboards**, and **end-of-cycle reconciliation** that produces an invoice (including overage) finance can trust.
```hint Granting credits
Treat a monthly grant as just another **signed ledger entry** (a positive `grant` transaction) posted on the anniversary, so all balance math stays uniform. Rollover/expiry become grants with an `expires_at` plus a capping rule.
```
```hint Multiple buckets
When credits come from several sources with priority and expiry, deduction becomes "spend from the highest-priority, non-expired bucket first." Model buckets **explicitly** rather than collapsing everything into one scalar balance.
```
```hint Reporting
Roll the ledger up **asynchronously** (e.g., per-account/per-day aggregates) for dashboards and invoices so you never scan the raw ledger online. The ledger stays the source of truth that reconciliation audits against.
```
#### What This Part Should Cover
```premium-lock What This Part Should Cover
```
### What a Strong Answer Covers
```premium-lock What a Strong Answer Covers
```
### Follow-up Questions
- A third-party provider charges per **successful** row, and you only learn the true cost after the call. How do you reshape Part 1's authorize/capture so users are billed accurately without ever blocking the balance check on the provider?
- An operation succeeds at the provider but the process crashes **before** writing the ledger deduct. Walk through how you detect and recover this without double-charging or giving away free credits.
- Finance reports that the dashboard total and the invoice disagree by 0.3% for some accounts. How do you locate the source of drift, and which number do you trust?
- How would you serve a real-time, low-latency "credits remaining" figure in the UI for an account running a 100k-row job, given Part 2's sharded/reserved balances?
Quick Answer: This question evaluates a candidate's ability to design a distributed metering and billing system that tracks account balances, authorizes consumption, and prevents double-charging or negative balances under concurrent, bursty load. It tests system design skills around consistency, idempotency, and lifecycle management for usage-based billing, commonly asked to assess practical architecture experience with financial-grade data integrity.