Design Mobile Model Usage Quotas
Company: OpenAI
Role: Android Engineer
Category: System Design
Difficulty: medium
Interview Round: Technical Screen
Design the mobile and backend API flow for controlling access limits to different AI model versions in a ChatGPT-like mobile app.
The product offers multiple model versions (for example, a fast standard model, a more capable advanced model, and possibly experimental models). Each model may have its own **free usage quota**. The mobile app must know whether the signed-in user can access a selected model. When a user exceeds the free quota for a model, the app should surface a toast that explains **how long remains until the next free access becomes available** and offers a **call-to-action to upgrade to a paid plan**.
Your design should cover both sides of the contract: the **backend endpoints** that own quota policy and enforcement, and the **Android client architecture** (using MVI or MVVM) that consumes them. Provide representative function signatures for the mobile client and the backend contract. Assume the mobile implementation is Android-focused, but the API must be platform-independent (the same contract should work for iOS and web).
Walk through the end-to-end design: the backend data model and endpoints, how quota is enforced atomically, and how the Android client models state and renders the quota-exceeded toast plus the upgrade action.
```hint Where to start
Make the backend the single source of truth for quota. Ask which component owns the *policy* (limits per model, window type) versus the *counter* (per-user, per-model usage), and keep the client a thin renderer of a server decision.
```
```hint Avoiding the check-then-send race
Imagine a separate "is it allowed?" call that the UI uses to enable the send button. Two signed-in devices both call it, both see one free use left, and both proceed. What just happened to the free quota — and which endpoint, if any, would have caught it? Let that question decide *where* the binding decision has to live.
```
```hint Counting usage
You need to know "how many uses in the current window." Sketch what that costs for a fixed daily reset versus a last-24-hours rolling window — do they want the same data structure? Then ask the harder question: if the count and the increment are two separate steps, what can slip between them under concurrency, and what would have to be true to close that gap?
```
```hint Shaping the denied response
The toast has to state a wait time and offer an upgrade, but the device clock may be wrong and the copy may need localizing. Given that, what is the *minimum* the server must put in the denied response so the client computes nothing about quota itself — and should that be free-form text or a structured shape?
```
```hint Android state vs effects
Separate persistent screen *state* (selected model, message text, loading) from one-shot *effects* (show the quota toast, open the upgrade URL). In MVI these are an explicit effects channel; in MVVM a `SharedFlow`/`Channel` of events. This is what keeps the toast from re-firing on recomposition or rotation.
```
### Constraints & Assumptions
- Quota policy and current usage are **server-owned**; the client must not hardcode or locally compute quota rules (they change without an app release).
- The quota check sits on the **message-send hot path**, so the enforcement decision should be low latency.
- A user can be signed in on **multiple devices at once**, and subscription status can change on another device.
- Free limits differ **per model**; paid plans may grant higher or unlimited quotas.
- Quota windows may be **fixed** (e.g. resets at midnight UTC) or **rolling** (e.g. last 24 hours); the design should accommodate both.
- Reset times and human-readable copy should come from the server (clients may have wrong clocks and need localization).
### Clarifying Questions to Ask
- What dimensions can a quota be keyed on — message count, token count, or both — and does it differ by model?
- Is the free quota a fixed daily reset, a rolling window, or a per-model cooldown?
- What entitlements does a paid plan grant: a higher numeric limit, unlimited use, or access to otherwise-locked models?
- Should usage be enforced strictly (hard block) or allowed to soft-overshoot for latency, then reconciled?
- Does the upgrade flow go through a web URL or the native Google Play / App Store billing flow?
- Must the copy be localized server-side, or does the client own the strings?
### What a Strong Answer Covers
- A clear ownership boundary: backend owns policy + enforcement; the client renders a server decision and never duplicates quota logic.
- A backend contract with an **availability/preload** endpoint for the UI and an **authoritative enforcement** point on the send path, plus the structured shape of the allowed and denied responses.
- Atomic check-and-consume that is correct under **concurrent multi-device** access (no check-then-send race, no double-decrement).
- A usage-counter design that handles both fixed and rolling windows, with TTL/eviction so storage doesn't grow unbounded.
- An Android architecture (MVVM or MVI) that cleanly separates persistent state from one-time effects, with representative `ViewModel` / repository / use-case signatures and domain result types (`Allowed` / `Denied`).
- Edge-case handling: clock skew, subscription change on another device, network failure mid-consume, offline behavior, and localization.
### Follow-up Questions
- How would you extend the design from a simple request-count quota to a **token-based** quota where each message consumes a variable amount?
- A user is charged for a request whose response never arrives (network drop after the server consumed the quota). How do you make consumption **idempotent** or refundable?
- How would you A/B test two different free-quota limits per model without shipping a new app build?
- The advanced model is overloaded and you must temporarily lower everyone's free quota. What in your design lets you do that with zero client release?
Quick Answer: This question evaluates a candidate's ability to design a server-backed quota system and its Android client integration, covering API contract design, atomic quota enforcement, concurrency reasoning, and client state/effect handling.