Design RL-based spending limit policy
Company: PayPal
Role: Machine Learning Engineer
Category: ML System Design
Difficulty: hard
Interview Round: Onsite
You are to set per-user spending limits using reinforcement learning. Define the environment: state representation (customer, risk, and context features), action space (limit adjustments), transition dynamics, and reward signal (e.g., profit, credit losses, user satisfaction, regulatory penalties). Explain training approach (offline RL or contextual bandits from logged data), exploration strategy under risk constraints, off-policy evaluation, safety guardrails, and how to handle cold start and non-stationarity.
Quick Answer: This question evaluates proficiency in designing reinforcement-learning-based decision systems for per-user spending limits, examining MDP formulation, state/action/reward specification, safety constraints, off-policy evaluation, and deployment considerations within the payments and risk domain.