Design a Multi‑Tenant GPU Credit Allocation Service
Context
You are designing a multi-tenant platform where organizations run GPU jobs. Each organization receives monthly GPU credits that are consumed as jobs run. The system must expose APIs to issue, transfer, and spend credits; enforce budgets and rate limits in real time; and integrate with a job scheduler to admit or reject workloads based on available credits.
Assume credits are the unit of spend (e.g., 1 credit = $0.01 of GPU time) and that different GPU types have different prices (credits per GPU-minute). Jobs may be submitted by users to projects within an org. The platform should support prepaid and postpaid billing models.
Requirements
Design a service that:
-
Defines APIs to:
-
Issue and expire monthly credits.
-
Transfer credits between org, project, and user accounts.
-
Reserve, spend, and refund credits tied to workloads.
-
Query balances, budgets, and audit logs.
-
Enforces budgets, quotas, and rate limits in real time under concurrency.
-
Integrates with a job scheduler to admit/reject jobs based on available credits and quotas.
-
Supports prepaid vs. postpaid models and per-user/project quotas.
-
Ensures idempotency, prevents overspend, and provides audit logging/reporting.
-
Addresses data model, consistency model, failure recovery, scaling (partitioning, caching), and observability/alerting.
Make minimal reasonable assumptions where unspecified, and call them out.