GPU Credit Ledgers And Resource Accounting
Asked of: Software Engineer
Last updated

What's being tested
This tests whether you can design a distributed resource-accounting system where money-like credits control access to scarce GPU capacity. OpenAI cares because multi-tenant GPU platforms must prevent overspend, enforce fairness, recover from failures, and still keep scheduling latency low under heavy concurrency. The interviewer is probing for ledger correctness, idempotent APIs, quota enforcement, scheduling tradeoffs, and the ability to reason about partial failures without hand-waving “exactly once.” A strong Software Engineer answer separates the source of truth for credits from fast-path admission control and explains how reconciliation keeps them consistent.
Core knowledge
-
Ledger-first accounting is the safest model: store immutable debit/credit entries rather than mutating a single balance as truth. Balance is derived as , often materialized for speed. This gives auditability, replay, backfill, and easier recovery after bugs.
-
Reservation versus consumption is central for GPUs. Admission should place a hold for estimated cost, actual job telemetry later records usage, and completion releases or debits the difference. A typical formula is
cost = gpu_seconds * gpu_type_rate * priority_multiplier, with heterogeneous devices likeA100andH100priced differently. -
Idempotency keys prevent duplicate charges when clients retry. APIs like
POST /reservationsandPOST /usage-eventsshould accept a client-generatedidempotency_keyand return the original result for the same tenant/key/body. Stripe’s pattern is the model: dedupe at the operation boundary, not just in the client. -
Transactional consistency matters at the credit boundary. For a single tenant balance, a
Postgresrow withSELECT ... FOR UPDATE, an atomic conditional update, orSERIALIZABLEtransaction can enforceavailable >= hold_amount. At very high scale, shard bytenant_idand keep all balance-affecting operations for a tenant on the same shard. -
Fast-path quota enforcement often uses cached counters, but the cache cannot be the authority for billable state.
Redistoken buckets or local scheduler caches can reject obvious over-limit requests quickly, while successful admissions still need a durable ledger reservation. If the cache and ledger disagree, the ledger wins. -
At-least-once events are normal; design consumers to be idempotent. Usage collectors may emit duplicate or delayed
job_started,heartbeat, andjob_finishedevents. Use event IDs, monotonic sequence numbers per job, or(job_id, interval_start, interval_end)uniqueness to avoid double debiting. -
Scheduler integration should combine credit eligibility with cluster constraints. A job is schedulable only if it passes credit checks, tenant quota, GPU availability, placement constraints, and priority. Algorithms include weighted fair queuing, dominant resource fairness for multi-resource jobs, and priority queues with aging to prevent starvation.
-
Leases handle abandoned reservations. A reservation should have
expires_atand be renewable by scheduler heartbeats; if the job never starts or the scheduler crashes, a sweeper releases the hold. Leases must be long enough to avoid false expiration during transient outages but short enough to free stranded credits. -
Double-entry accounting reduces ambiguity for transfers and purchases. A customer top-up credits the tenant account and debits a revenue or liability account; GPU usage debits tenant credits and credits an internal compute account. Even if the implementation is simplified, this mental model helps avoid “credits disappeared” bugs.
-
Vector clocks and expirations appear when credits have multiple grants with different validity windows or are updated in multiple regions. If operations are partially ordered, a vector clock can detect concurrent updates rather than incorrectly overwriting one. For most SWE designs, prefer single-writer per tenant; use vector clocks only when multi-master writes are a hard requirement.
-
Reconciliation is a first-class subsystem. Periodic jobs compare ledger reservations, scheduler job state, GPU telemetry, and invoices: “reserved but never started,” “running without reservation,” “usage with no completion,” and “negative available balance.” Reconciliation should produce compensating ledger entries, not edit historical rows.
-
Observability should expose correctness and latency metrics:
reservation_success_rate,insufficient_credit_rejects,ledger_write_latency_p99,scheduler_admission_latency_p99,orphaned_holds_count,negative_balance_count, andusage_event_lag_seconds. Alert on invariants, not just CPU or queue depth.
Worked example
For Design GPU credit allocator, start by framing the first 30 seconds around scope: “Are credits prepaid or postpaid? Do we need hard prevention of overspend or eventual billing? What GPU types and scheduling latency are expected? Is this single-region or multi-region?” Then declare assumptions: prepaid credits, hard admission control, heterogeneous GPUs, and thousands to millions of tenants with jobs lasting seconds to days. Organize the answer into four pillars: a durable ledger service, a reservation/hold API, scheduler admission flow, and reconciliation/observability.
The core flow is: client submits job, scheduler asks credit service for a reservation based on estimated cost, credit service atomically creates a hold if available balance is sufficient, scheduler places the job, usage events convert holds into debits, and leftover hold is released on completion. The data model should include accounts, ledger_entries, reservations, jobs, and usage_events, with unique constraints on idempotency_key and job_id event intervals. For concurrency, say explicitly that balance-affecting writes for a tenant are serialized, either via a database transaction on a tenant balance row or by routing a tenant to a single ledger partition.
A useful tradeoff to flag is strict correctness versus scheduling latency. A synchronous ledger call on every admission prevents overspend but adds latency and creates a dependency; preallocated per-scheduler credit buckets reduce latency but can strand capacity and require careful reconciliation. Close by saying that, with more time, you would dig into multi-region failover, GPU preemption/refunds, and how to test invariants with fault injection.
A second angle
For Design credit balance with vector-clock expirations, the same accounting principles apply, but the interviewer is emphasizing causality and per-user state management rather than scheduler flow. The key difference is that credits may arrive from multiple grants, expire at different times, and be consumed concurrently. A strong design uses immutable credit lots with grant_id, amount, remaining, expires_at, and consumes earliest-expiring credits first, similar to FEFO inventory accounting.
If writes are single-region, you can avoid vector-clock complexity by serializing updates per user. If multi-region concurrent debits are required, vector clocks help detect “these two spends happened without seeing each other,” after which you either reject one, merge with compensating debt, or require a conflict-resolution policy. The interviewer will expect you to explain the cost of vector clocks: metadata grows with writers, comparisons are partial orders, and conflict resolution is a product-visible behavior even if the implementation is technical.
Common pitfalls
Pitfall: Treating balance as a mutable integer with
balance -= cost.
That answer misses auditability, retries, and reconciliation. A better answer uses immutable ledger entries, a materialized balance for performance, and transactional holds to prevent overspend.
Pitfall: Claiming “exactly-once billing” because events are delivered through
Kafka.
Distributed systems rarely give end-to-end exactly-once semantics across clients, queues, databases, and schedulers. Say “at-least-once delivery with idempotent processing and unique operation IDs,” then show where deduplication happens.
Pitfall: Designing the scheduler and ignoring the money boundary.
A GPU scheduler that only optimizes utilization can admit jobs that tenants cannot pay for, while a credit service that ignores scheduling can hold credits forever for jobs that never run. Land better by describing the contract between scheduler and ledger: reserve, renew, consume, release, and reconcile.
Connections
Interviewers may pivot from here into rate limiting, distributed transactions, idempotent payment processing, fair scheduling, or multi-region consistency. They may also ask for a deeper dive on one component, such as implementing a token bucket, designing ledger schemas in Postgres, or handling delayed usage events from a telemetry pipeline.
Further reading
-
Designing Data-Intensive Applications — Chapters on transactions, replication, partitioning, and stream processing map directly to ledger correctness and reconciliation.
-
Stripe API Idempotent Requests — Practical model for safe retries around money-like operations.
-
Dominant Resource Fairness: Fair Allocation of Multiple Resource Types — Seminal scheduling paper for fair allocation across heterogeneous resources.
Featured in interview prep guides
Practice questions
- Implement credit ledger with out-of-order timestampsOpenAI · Software Engineer · Technical Screen · hard
- Implement an expiring GPU-credit managerOpenAI · Software Engineer · Technical Screen · Medium
- Implement expiring credit ledgerOpenAI · Software Engineer · Technical Screen · Medium
- Implement GPU credit ledgerOpenAI · Software Engineer · Technical Screen · Medium
- Design a GPU credit system and schedulerOpenAI · Software Engineer · Technical Screen · hard
- Implement a GPU credit managerOpenAI · Software Engineer · Technical Screen · Medium
- Manage GPU Credits with ExpirationOpenAI · Software Engineer · Technical Screen · Medium
- Design GPU credit allocatorOpenAI · Software Engineer · Technical Screen · hard
- Design a GPU credit allocation serviceOpenAI · Software Engineer · Technical Screen · hard
- Implement an expiring GPU credits ledgerOpenAI · Software Engineer · Technical Screen · Medium
Related concepts
- GPU Credit Ledgers And SchedulersCoding & Algorithms
- GPU Programming, Graphics APIs, And Shader CompilersSystem Design
- ML Inference APIs And GPU BatchingML System Design
- Distributed System Design For Ledgers And CountersSystem Design
- Wallets, Payments, And Refund LedgersSystem Design
- Load Balancing And Resource Lifecycle SimulationCoding & Algorithms