How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

What difficulty level is this interview question?

This is a hard difficulty ML System Design question, commonly asked during Onsite rounds at PayPal.

What role is this question designed for?

This question is commonly asked for Machine Learning Engineer candidates at PayPal during technical interviews.

Design RL-based spending limit policy | PayPal Interview Question

Quick Overview

This question evaluates proficiency in designing reinforcement-learning-based decision systems for per-user spending limits, examining MDP formulation, state/action/reward specification, safety constraints, off-policy evaluation, and deployment considerations within the payments and risk domain.

RL System Design: Per‑User Spending Limits

You are designing a reinforcement learning (RL) system to set per-user spending limits in a payments/risk context. The goal is to balance revenue and user experience against fraud/credit losses and regulatory compliance.

Task

Define and justify the RL formulation and training/deployment approach:

Environment/MDP
- State representation: What customer, risk, and context features are included? How are they featurized and updated over time?
- Action space: How are spending limit decisions represented (e.g., absolute limit vs. incremental adjustments; discrete vs. continuous)? Include any action masks.
- Transition dynamics: What drives state evolution and partial observability? How does the policy influence future states and outcomes?
- Reward signal: Specify the components (e.g., profit, expected credit/fraud losses, user satisfaction/friction, regulatory penalties) and how you aggregate/discount them.
Training approach
- Describe how to use logged historical decisions to train: offline RL vs. contextual bandits. When would you pick each?
Exploration under risk constraints
- Propose an exploration strategy that respects hard safety constraints while still learning.
Off‑policy evaluation (OPE)
- How will you evaluate candidate policies before online deployment, including sequential and bandit cases?
Safety guardrails
- Define policy- and system‑level controls that prevent harmful actions and enable safe rollout.
Cold start
- How will you handle new users or merchants with little or no history?
Non‑stationarity
- How will you detect and adapt to distribution shifts (seasonality, new fraud patterns, macro shocks)?
Deployment
- Outline a cautious rollout plan and real‑time monitoring for this RL system.

Quick Overview

Task

Define and justify the RL formulation and training/deployment approach:

Environment/MDP

State representation: What customer, risk, and context features are included? How are they featurized and updated over time?
Action space: How are spending limit decisions represented (e.g., absolute limit vs. incremental adjustments; discrete vs. continuous)? Include any action masks.
Transition dynamics: What drives state evolution and partial observability? How does the policy influence future states and outcomes?
Reward signal: Specify the components (e.g., profit, expected credit/fraud losses, user satisfaction/friction, regulatory penalties) and how you aggregate/discount them.

Training approach

Describe how to use logged historical decisions to train: offline RL vs. contextual bandits. When would you pick each?

Exploration under risk constraints

Propose an exploration strategy that respects hard safety constraints while still learning.

Off‑policy evaluation (OPE)

How will you evaluate candidate policies before online deployment, including sequential and bandit cases?

Safety guardrails

Define policy- and system‑level controls that prevent harmful actions and enable safe rollout.

Cold start

How will you handle new users or merchants with little or no history?

Non‑stationarity

How will you detect and adapt to distribution shifts (seasonality, new fraud patterns, macro shocks)?

Deployment

Outline a cautious rollout plan and real‑time monitoring for this RL system.

Design RL-based spending limit policy

Quick Overview

Design RL-based spending limit policy

RL System Design: Per‑User Spending Limits

Task

Submit Your Answer to Earn 20XP

Design RL-based spending limit policy

Quick Overview

Design RL-based spending limit policy

RL System Design: Per‑User Spending Limits

Task

Submit Your Answer to Earn 20XP