Design a GPU credit accounting and scheduling service for an ML platform. Users purchase credits, submit training/inference jobs, and consume credits while jobs run. Requirements: credit issuance, balance queries, reservation at submission, metered consumption during execution, partial refunds on preemption/failure, expiration and promotional credits, per-user and per-project budgets, and audit trails. The API must be idempotent and concurrency-safe, with rate limits and protection against double-spend under races. The scheduler should place jobs on heterogeneous GPUs (e.g., A100/H 100) based on resource requirements and available quota, supporting fairness across users/teams and preemption policies. Describe schemas and data structures, consistency choices (strong vs. eventual), handling of clock skew, sharding and scaling strategies, and observability. Outline a test plan that captures edge cases and uncovers unspecified requirements.

This question evaluates system design, distributed systems, and resource-accounting skills focused on concurrency control, idempotent APIs, billing/credit models, and scheduler design for heterogeneous GPUs in multi-tenant ML platforms.

How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

What difficulty level is this interview question?

This is a hard difficulty ML System Design question, commonly asked during Technical Screen rounds at OpenAI.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at OpenAI during technical interviews.

Design a GPU credit system and scheduler | OpenAI Interview Question

Design a GPU Credit Accounting and Scheduling Service (Technical Screen)

Context

You are designing a backend service for an ML platform that runs training and inference on heterogeneous GPUs (e.g., A100, H100). Users/teams purchase credits and consume them while jobs run. The platform must prevent double-spend under concurrency, schedule fairly across users/teams, and handle preemption/failures with partial refunds.

Assume GPU pricing is per GPU-hour and differs by GPU type. Jobs specify resource requirements (GPU type preferences, count, memory) and may be preempted according to policy. The system is multi-tenant, multi-project, and multi-region.

Functional Requirements

Credit lifecycle
- Issuance (purchases, grants, promotions) and expiration.
- Balance queries with breakdown (promotional vs paid, expirations).
- Spend ordering across buckets (e.g., earliest-expiring first).
Reservation and metering
- Idempotent reservation at job submission that checks budgets/quotas.
- Metered consumption while jobs run; commit actual usage and partially refund unused holds on completion, preemption, or failure.
Budgets and quotas
- Per-user and per-project budgets; hierarchical limits (team/org → project → user).
- Promotional credits with separate policies and expiration.
Scheduling
- Place jobs on heterogeneous GPUs based on requirements and available quota/credits.
- Fairness across users/teams; support weights/priority classes and preemption.
Audit and observability
- Immutable audit trail for all credit and scheduling decisions.
- Metrics, logs, and traces for SLOs and debugging.

Non-Functional Requirements

APIs must be idempotent and concurrency-safe with rate limits.
Protect against double-spend under races and retries.
Clearly state consistency choices (strong vs eventual) and handle clock skew.
Sharding/scaling strategies for high throughput.

Deliverables

Provide:

Architecture overview (components and data flow).
Data schemas and key data structures.
API design and idempotency model.
Scheduling algorithm and preemption policies.
Consistency model and concurrency control (including double-spend protection and clock skew handling).
Sharding and scaling strategy.
Observability plan.
A test plan that exercises edge cases and surfaces unspecified requirements.

Context

Functional Requirements

Credit lifecycle

Issuance (purchases, grants, promotions) and expiration.
Balance queries with breakdown (promotional vs paid, expirations).
Spend ordering across buckets (e.g., earliest-expiring first).

Reservation and metering

Idempotent reservation at job submission that checks budgets/quotas.
Metered consumption while jobs run; commit actual usage and partially refund unused holds on completion, preemption, or failure.

Budgets and quotas

Per-user and per-project budgets; hierarchical limits (team/org → project → user).
Promotional credits with separate policies and expiration.

Scheduling

Place jobs on heterogeneous GPUs based on requirements and available quota/credits.
Fairness across users/teams; support weights/priority classes and preemption.

Audit and observability

Immutable audit trail for all credit and scheduling decisions.
Metrics, logs, and traces for SLOs and debugging.

Deliverables

Provide:

Architecture overview (components and data flow).

Data schemas and key data structures.

API design and idempotency model.

Scheduling algorithm and preemption policies.

Consistency model and concurrency control (including double-spend protection and clock skew handling).

Sharding and scaling strategy.

Observability plan.

A test plan that exercises edge cases and surfaces unspecified requirements.

Design a GPU credit system and scheduler

Quick Overview

Design a GPU Credit Accounting and Scheduling Service (Technical Screen)

Context

Functional Requirements

Non-Functional Requirements

Deliverables

Solution

Comments (0)

Design a GPU credit system and scheduler

Quick Overview

Design a GPU Credit Accounting and Scheduling Service (Technical Screen)

Context

Functional Requirements

Non-Functional Requirements

Deliverables

Solution

Comments (0)