How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

What difficulty level is this interview question?

This is a medium difficulty System Design question, commonly asked during Onsite rounds at PayPal.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at PayPal during technical interviews.

Design a Payment Fraud Detection Service | PayPal Interview Question

Q: Design a Payment Fraud Detection Service

This system design question tests the ability to architect a real-time, low-latency fraud decisioning service that combines deterministic rule engines with machine learning model scores. It evaluates practical understanding of distributed systems trade-offs, feature store design, and safe model/rule deployment on a high-availability critical path.

Design a real-time fraud detection service for a payment platform. When a user submits a payment attempt, the platform calls your service before authorizing or capturing the charge, and your service must return a decision: allow, deny, challenge (e.g. step-up authentication), or send to manual review.

The service should reason over signals such as transaction amount, merchant, user account history, device fingerprint, IP address, geolocation, payment instrument, velocity signals (how often a card/device/account has been used recently), chargeback history, and known fraud patterns. It must combine deterministic rules (written and owned by risk analysts) with machine-learning model scores, do so under tight latency, and remain explainable, auditable, and highly available. Risk analysts must be able to author, test, and deploy new rules and models safely, and confirmed-fraud / chargeback / manual-review outcomes must flow back as labels to improve the system over time.

Constraints & Assumptions

Online decisioning is on the payment critical path; target p99 latency in the low hundreds of milliseconds (e.g. < 100–200 ms) for the synchronous risk check.
High request volume (tens of thousands of payment attempts per second at peak for a large platform); assume bursty traffic.
High availability is required — fraud-service downtime stalls payment processing.
Fraud labels are delayed and noisy (chargebacks can land 30+ days after the transaction; not all fraud is ever labeled).
The service must be explainable : every deny / review must carry reason codes for customer support, compliance, and dispute handling.
Rule and model changes must roll out safely (no big-bang deploys to a live money-movement path).
Sensitive payment data (PAN, etc.) must be tokenized / minimized; treat PCI and privacy obligations as hard constraints.

Clarifying Questions to Ask

What is the call pattern — is the risk check synchronous and blocking before authorization, or can some decisions be made asynchronously (e.g. post-auth holds)?
What are the business priorities for the decision tradeoff — minimize fraud loss, maximize approval/conversion, or a target chargeback rate? This sets the thresholds.
What is the expected QPS and latency budget , and is it uniform globally or regional?
Who owns rules and how fast must they ship (e.g. an analyst reacting to a live fraud attack in minutes)?
What labels and feedback are available (chargebacks, disputes, manual-review outcomes, confirmed fraud) and with what delay?
Are there compliance / regulatory requirements (PCI-DSS, sanctions screening, regional data residency) that constrain storage and the decision flow?
Is there an existing feature platform / model serving infra to reuse, or is this greenfield?

What a Strong Answer Covers

Requirements framing : separates functional (decision, rules + model, analyst tooling, feedback ingestion, auditability) from non-functional (latency, availability, explainability, safe rollout, security) and ties decisions back to the business tradeoff (fraud loss vs. approval rate).
Synchronous decision path : a clean Risk API, an idempotent request/response contract with reason codes and rule/model versions, and a decision engine that fuses hard rules, the model score, and policy into allow / deny / challenge / review.
Feature architecture : an online feature store fed by a stream processor; the same feature definitions used offline for training to avoid training–serving skew ; a per-decision feature snapshot.
Rules engine : versioning, approval workflow, dry-run / shadow, allow/block lists, and full audit of who changed what.
Model lifecycle : registry with versioned models + feature schema, shadow → canary rollout, a rules-only fallback, and monitoring for drift and feature freshness.
Scale & availability : handling QPS within the latency budget, multi-zone deployment, caching of static data, strict timeouts, circuit breakers, and an explicit fail-open vs. fail-closed policy by risk tier.
Auditability & monitoring : what is persisted per decision, plus the key health metrics (latency/error rate, decision distribution, chargeback / confirmed-fraud rate, false-positive rate from review, score drift).
Security & privacy : tokenization, encryption, role-based access, tamper-resistant logs, and regulatory alignment.

Follow-up Questions

A new fraud attack pattern emerges that the model has never seen. How does an analyst respond within minutes , and how do you ensure the rule they ship doesn't accidentally block a large swath of legitimate traffic?
Chargeback labels arrive 30–60 days after the transaction. How does this label delay affect model retraining cadence and your ability to detect a sudden model-quality regression quickly ? What faster proxy signals could you watch?
How do you measure whether a deny/challenge decision was correct given that you never observe the counterfactual outcome of transactions you blocked? What sampling or experimentation could break this feedback bias?
The model serving tier degrades and starts timing out under a traffic spike. Walk through exactly what your service returns for a $5 transaction vs. a$ 5,000 transaction during the outage, and why.

Constraints & Assumptions

Online decisioning is on the payment critical path; target p99 latency in the low hundreds of milliseconds (e.g. < 100–200 ms) for the synchronous risk check.
High request volume (tens of thousands of payment attempts per second at peak for a large platform); assume bursty traffic.
High availability is required — fraud-service downtime stalls payment processing.
Fraud labels are delayed and noisy (chargebacks can land 30+ days after the transaction; not all fraud is ever labeled).
The service must be explainable : every deny / review must carry reason codes for customer support, compliance, and dispute handling.
Rule and model changes must roll out safely (no big-bang deploys to a live money-movement path).
Sensitive payment data (PAN, etc.) must be tokenized / minimized; treat PCI and privacy obligations as hard constraints.

Clarifying Questions to Ask

What is the call pattern — is the risk check synchronous and blocking before authorization, or can some decisions be made asynchronously (e.g. post-auth holds)?
What are the business priorities for the decision tradeoff — minimize fraud loss, maximize approval/conversion, or a target chargeback rate? This sets the thresholds.
What is the expected QPS and latency budget , and is it uniform globally or regional?
Who owns rules and how fast must they ship (e.g. an analyst reacting to a live fraud attack in minutes)?
What labels and feedback are available (chargebacks, disputes, manual-review outcomes, confirmed fraud) and with what delay?
Are there compliance / regulatory requirements (PCI-DSS, sanctions screening, regional data residency) that constrain storage and the decision flow?
Is there an existing feature platform / model serving infra to reuse, or is this greenfield?

What a Strong Answer Covers

Requirements framing : separates functional (decision, rules + model, analyst tooling, feedback ingestion, auditability) from non-functional (latency, availability, explainability, safe rollout, security) and ties decisions back to the business tradeoff (fraud loss vs. approval rate).
Synchronous decision path : a clean Risk API, an idempotent request/response contract with reason codes and rule/model versions, and a decision engine that fuses hard rules, the model score, and policy into allow / deny / challenge / review.
Feature architecture : an online feature store fed by a stream processor; the same feature definitions used offline for training to avoid training–serving skew ; a per-decision feature snapshot.
Rules engine : versioning, approval workflow, dry-run / shadow, allow/block lists, and full audit of who changed what.
Model lifecycle : registry with versioned models + feature schema, shadow → canary rollout, a rules-only fallback, and monitoring for drift and feature freshness.
Scale & availability : handling QPS within the latency budget, multi-zone deployment, caching of static data, strict timeouts, circuit breakers, and an explicit fail-open vs. fail-closed policy by risk tier.
Auditability & monitoring : what is persisted per decision, plus the key health metrics (latency/error rate, decision distribution, chargeback / confirmed-fraud rate, false-positive rate from review, score drift).
Security & privacy : tokenization, encryption, role-based access, tamper-resistant logs, and regulatory alignment.

Follow-up Questions

A new fraud attack pattern emerges that the model has never seen. How does an analyst respond within minutes , and how do you ensure the rule they ship doesn't accidentally block a large swath of legitimate traffic?
Chargeback labels arrive 30–60 days after the transaction. How does this label delay affect model retraining cadence and your ability to detect a sudden model-quality regression quickly ? What faster proxy signals could you watch?
How do you measure whether a deny/challenge decision was correct given that you never observe the counterfactual outcome of transactions you blocked? What sampling or experimentation could break this feedback bias?
The model serving tier degrades and starts timing out under a traffic spike. Walk through exactly what your service returns for a $5 transaction vs. a$ 5,000 transaction during the outage, and why.

Design a Payment Fraud Detection Service

Quick Overview

Constraints & Assumptions

Clarifying Questions to Ask

What a Strong Answer Covers

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP

Design a Payment Fraud Detection Service

Quick Overview

Constraints & Assumptions

Clarifying Questions to Ask

What a Strong Answer Covers

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP