Design GPU credit allocator

Q: Design GPU credit allocator

This question evaluates system design competencies in distributed resource accounting, real-time billing, fault tolerance, rate limiting, and fair scheduling for multi-tenant GPU compute platforms, situated in the System Design domain.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

System Design: GPU Credits Allocation and Fair Usage

Context

You are designing a multi-tenant platform that provides access to GPU compute across many nodes. Users pre-purchase credits and are charged based on GPU usage (e.g., per GPU-second). The system must track consumption in near real time, prevent overspending, support credit top-ups, and enforce fair usage and rate limits across the fleet.

Assume:

Multiple GPU types (A100, H100, etc.) with different prices.
Jobs can run on one or more GPUs and can migrate or be rescheduled.
The system must continue operating under node/agent failures and network partitions with bounded exposure.

Requirements

Track per-user GPU consumption across nodes and time.
Deduct credits in real time (seconds-level), preventing double-spend across nodes.
Support credit top-ups (payments) and immediate balance visibility.
Enforce rate limits: e.g., max concurrent GPUs, spend rate per second, daily caps.
Enforce fair usage across users (no one user can starve others) when resources are scarce.
Fault tolerance: handle node/agent/ledger outages; guarantee at-most-bounded overspend.
Auditable ledger: idempotent, immutable records; reconcile provisional vs final charges.
APIs for balance, reserve/authorize, consume, top-up, and usage reporting.
Scalability to thousands of nodes and tens of thousands of concurrent jobs.

Deliverables

High-level architecture with key components and their responsibilities.
Data model for accounts, balances, holds/leases, usage events, and pricing.
Real-time deduction mechanism across multiple nodes (prevent double-spend).
Rate limiting and fair scheduling approach.
Failure handling and reconciliation strategy.
Small numeric example to illustrate charging and limits.

Design GPU credit allocator

System Design: GPU Credits Allocation and Fair Usage

Context

Requirements

Deliverables

Solution

Comments (0)

Design GPU credit allocator

Overview

System Design: GPU Credits Allocation and Fair Usage

Context

Requirements

Deliverables

Solution

Comments (0)