How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

What difficulty level is this interview question?

This is a medium difficulty System Design question, commonly asked during Technical Screen rounds at OpenAI.

What role is this question designed for?

This question is commonly asked for Android Engineer candidates at OpenAI during technical interviews.

Design Mobile Model Usage Quotas | OpenAI Interview Question

Design Mobile Model Usage Quotas

Company: OpenAI

Role: Android Engineer

Category: System Design

Difficulty: medium

Interview Round: Technical Screen

Design the mobile and backend API flow for controlling access limits to different AI model versions in a ChatGPT-like mobile app. The product offers multiple model versions (for example, a fast standard model, a more capable advanced model, and possibly experimental models). Each model may have its own **free usage quota**. The mobile app must know whether the signed-in user can access a selected model. When a user exceeds the free quota for a model, the app should surface a toast that explains **how long remains until the next free access becomes available** and offers a **call-to-action to upgrade to a paid plan**. Your design should cover both sides of the contract: the **backend endpoints** that own quota policy and enforcement, and the **Android client architecture** (using MVI or MVVM) that consumes them. Provide representative function signatures for the mobile client and the backend contract. Assume the mobile implementation is Android-focused, but the API must be platform-independent (the same contract should work for iOS and web). Walk through the end-to-end design: the backend data model and endpoints, how quota is enforced atomically, and how the Android client models state and renders the quota-exceeded toast plus the upgrade action. ```hint Where to start Make the backend the single source of truth for quota. Ask which component owns the *policy* (limits per model, window type) versus the *counter* (per-user, per-model usage), and keep the client a thin renderer of a server decision. ``` ```hint Avoiding the check-then-send race Imagine a separate "is it allowed?" call that the UI uses to enable the send button. Two signed-in devices both call it, both see one free use left, and both proceed. What just happened to the free quota — and which endpoint, if any, would have caught it? Let that question decide *where* the binding decision has to live. ``` ```hint Counting usage You need to know "how many uses in the current window." Sketch what that costs for a fixed daily reset versus a last-24-hours rolling window — do they want the same data structure? Then ask the harder question: if the count and the increment are two separate steps, what can slip between them under concurrency, and what would have to be true to close that gap? ``` ```hint Shaping the denied response The toast has to state a wait time and offer an upgrade, but the device clock may be wrong and the copy may need localizing. Given that, what is the *minimum* the server must put in the denied response so the client computes nothing about quota itself — and should that be free-form text or a structured shape? ``` ```hint Android state vs effects Separate persistent screen *state* (selected model, message text, loading) from one-shot *effects* (show the quota toast, open the upgrade URL). In MVI these are an explicit effects channel; in MVVM a `SharedFlow`/`Channel` of events. This is what keeps the toast from re-firing on recomposition or rotation. ``` ### Constraints & Assumptions - Quota policy and current usage are **server-owned**; the client must not hardcode or locally compute quota rules (they change without an app release). - The quota check sits on the **message-send hot path**, so the enforcement decision should be low latency. - A user can be signed in on **multiple devices at once**, and subscription status can change on another device. - Free limits differ **per model**; paid plans may grant higher or unlimited quotas. - Quota windows may be **fixed** (e.g. resets at midnight UTC) or **rolling** (e.g. last 24 hours); the design should accommodate both. - Reset times and human-readable copy should come from the server (clients may have wrong clocks and need localization). ### Clarifying Questions to Ask - What dimensions can a quota be keyed on — message count, token count, or both — and does it differ by model? - Is the free quota a fixed daily reset, a rolling window, or a per-model cooldown? - What entitlements does a paid plan grant: a higher numeric limit, unlimited use, or access to otherwise-locked models? - Should usage be enforced strictly (hard block) or allowed to soft-overshoot for latency, then reconciled? - Does the upgrade flow go through a web URL or the native Google Play / App Store billing flow? - Must the copy be localized server-side, or does the client own the strings? ### What a Strong Answer Covers - A clear ownership boundary: backend owns policy + enforcement; the client renders a server decision and never duplicates quota logic. - A backend contract with an **availability/preload** endpoint for the UI and an **authoritative enforcement** point on the send path, plus the structured shape of the allowed and denied responses. - Atomic check-and-consume that is correct under **concurrent multi-device** access (no check-then-send race, no double-decrement). - A usage-counter design that handles both fixed and rolling windows, with TTL/eviction so storage doesn't grow unbounded. - An Android architecture (MVVM or MVI) that cleanly separates persistent state from one-time effects, with representative `ViewModel` / repository / use-case signatures and domain result types (`Allowed` / `Denied`). - Edge-case handling: clock skew, subscription change on another device, network failure mid-consume, offline behavior, and localization. ### Follow-up Questions - How would you extend the design from a simple request-count quota to a **token-based** quota where each message consumes a variable amount? - A user is charged for a request whose response never arrives (network drop after the server consumed the quota). How do you make consumption **idempotent** or refundable? - How would you A/B test two different free-quota limits per model without shipping a new app build? - The advanced model is overloaded and you must temporarily lower everyone's free quota. What in your design lets you do that with zero client release?

Quick Answer: This question evaluates a candidate's ability to design a server-backed quota system and its Android client integration, covering API contract design, atomic quota enforcement, concurrency reasoning, and client state/effect handling.

Design the mobile and backend API flow for controlling access limits to different AI model versions in a ChatGPT-like mobile app.

The product offers multiple model versions (for example, a fast standard model, a more capable advanced model, and possibly experimental models). Each model may have its own free usage quota. The mobile app must know whether the signed-in user can access a selected model. When a user exceeds the free quota for a model, the app should surface a toast that explains how long remains until the next free access becomes available and offers a call-to-action to upgrade to a paid plan.

Your design should cover both sides of the contract: the backend endpoints that own quota policy and enforcement, and the Android client architecture (using MVI or MVVM) that consumes them. Provide representative function signatures for the mobile client and the backend contract. Assume the mobile implementation is Android-focused, but the API must be platform-independent (the same contract should work for iOS and web).

Walk through the end-to-end design: the backend data model and endpoints, how quota is enforced atomically, and how the Android client models state and renders the quota-exceeded toast plus the upgrade action.

Constraints & Assumptions

Quota policy and current usage are server-owned ; the client must not hardcode or locally compute quota rules (they change without an app release).
The quota check sits on the message-send hot path , so the enforcement decision should be low latency.
A user can be signed in on multiple devices at once , and subscription status can change on another device.
Free limits differ per model ; paid plans may grant higher or unlimited quotas.
Quota windows may be fixed (e.g. resets at midnight UTC) or rolling (e.g. last 24 hours); the design should accommodate both.
Reset times and human-readable copy should come from the server (clients may have wrong clocks and need localization).

Clarifying Questions to Ask

What dimensions can a quota be keyed on — message count, token count, or both — and does it differ by model?
Is the free quota a fixed daily reset, a rolling window, or a per-model cooldown?
What entitlements does a paid plan grant: a higher numeric limit, unlimited use, or access to otherwise-locked models?
Should usage be enforced strictly (hard block) or allowed to soft-overshoot for latency, then reconciled?
Does the upgrade flow go through a web URL or the native Google Play / App Store billing flow?
Must the copy be localized server-side, or does the client own the strings?

What a Strong Answer Covers

A clear ownership boundary: backend owns policy + enforcement; the client renders a server decision and never duplicates quota logic.
A backend contract with an availability/preload endpoint for the UI and an authoritative enforcement point on the send path, plus the structured shape of the allowed and denied responses.
Atomic check-and-consume that is correct under concurrent multi-device access (no check-then-send race, no double-decrement).
A usage-counter design that handles both fixed and rolling windows, with TTL/eviction so storage doesn't grow unbounded.
An Android architecture (MVVM or MVI) that cleanly separates persistent state from one-time effects, with representative ViewModel / repository / use-case signatures and domain result types ( Allowed / Denied ).
Edge-case handling: clock skew, subscription change on another device, network failure mid-consume, offline behavior, and localization.

Follow-up Questions

How would you extend the design from a simple request-count quota to a token-based quota where each message consumes a variable amount?
A user is charged for a request whose response never arrives (network drop after the server consumed the quota). How do you make consumption idempotent or refundable?
How would you A/B test two different free-quota limits per model without shipping a new app build?
The advanced model is overloaded and you must temporarily lower everyone's free quota. What in your design lets you do that with zero client release?

Design Mobile Model Usage Quotas

Company: OpenAI

Role: Android Engineer

Category: System Design

Difficulty: medium

Interview Round: Technical Screen

Design Mobile Model Usage Quotas

Quick Overview

Constraints & Assumptions

Clarifying Questions to Ask

What a Strong Answer Covers

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP

Design Mobile Model Usage Quotas

Quick Overview

Constraints & Assumptions

Clarifying Questions to Ask

What a Strong Answer Covers

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP