Design RL-based spending limit policy

Q: Design RL-based spending limit policy

This is a ML System Design interview question from PayPal for Machine Learning Engineer roles. View the full question and solution on PracHub.

Q: How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

Question

RL System Design: Per‑User Spending Limits

You are designing a reinforcement learning (RL) system to set per-user spending limits in a payments/risk context. The goal is to balance revenue and user experience against fraud/credit losses and regulatory compliance.

Task

Define and justify the RL formulation and training/deployment approach:

Environment/MDP
- State representation: What customer, risk, and context features are included? How are they featurized and updated over time?
- Action space: How are spending limit decisions represented (e.g., absolute limit vs. incremental adjustments; discrete vs. continuous)? Include any action masks.
- Transition dynamics: What drives state evolution and partial observability? How does the policy influence future states and outcomes?
- Reward signal: Specify the components (e.g., profit, expected credit/fraud losses, user satisfaction/friction, regulatory penalties) and how you aggregate/discount them.
Training approach
- Describe how to use logged historical decisions to train: offline RL vs. contextual bandits. When would you pick each?
Exploration under risk constraints
- Propose an exploration strategy that respects hard safety constraints while still learning.
Off‑policy evaluation (OPE)
- How will you evaluate candidate policies before online deployment, including sequential and bandit cases?
Safety guardrails
- Define policy- and system‑level controls that prevent harmful actions and enable safe rollout.
Cold start
- How will you handle new users or merchants with little or no history?
Non‑stationarity
- How will you detect and adapt to distribution shifts (seasonality, new fraud patterns, macro shocks)?
Deployment
- Outline a cautious rollout plan and real‑time monitoring for this RL system.

Design RL-based spending limit policy

RL System Design: Per‑User Spending Limits

Task

Solution

Comments (0)