System Design: GPU Credits Allocation and Fair Usage
Context
You are designing a multi-tenant platform that provides access to GPU compute across many nodes. Users pre-purchase credits and are charged based on GPU usage (e.g., per GPU-second). The system must track consumption in near real time, prevent overspending, support credit top-ups, and enforce fair usage and rate limits across the fleet.
Assume:
-
Multiple GPU types (A100, H100, etc.) with different prices.
-
Jobs can run on one or more GPUs and can migrate or be rescheduled.
-
The system must continue operating under node/agent failures and network partitions with bounded exposure.
Requirements
-
Track per-user GPU consumption across nodes and time.
-
Deduct credits in real time (seconds-level), preventing double-spend across nodes.
-
Support credit top-ups (payments) and immediate balance visibility.
-
Enforce rate limits: e.g., max concurrent GPUs, spend rate per second, daily caps.
-
Enforce fair usage across users (no one user can starve others) when resources are scarce.
-
Fault tolerance: handle node/agent/ledger outages; guarantee at-most-bounded overspend.
-
Auditable ledger: idempotent, immutable records; reconcile provisional vs final charges.
-
APIs for balance, reserve/authorize, consume, top-up, and usage reporting.
-
Scalability to thousands of nodes and tens of thousands of concurrent jobs.
Deliverables
-
High-level architecture with key components and their responsibilities.
-
Data model for accounts, balances, holds/leases, usage events, and pricing.
-
Real-time deduction mechanism across multiple nodes (prevent double-spend).
-
Rate limiting and fair scheduling approach.
-
Failure handling and reconciliation strategy.
-
Small numeric example to illustrate charging and limits.