PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/System Design/PayPal

Design a Payment Fraud Detection Service

Last updated: Jun 25, 2026

Quick Overview

This system design question tests the ability to architect a real-time, low-latency fraud decisioning service that combines deterministic rule engines with machine learning model scores. It evaluates practical understanding of distributed systems trade-offs, feature store design, and safe model/rule deployment on a high-availability critical path.

  • medium
  • PayPal
  • System Design
  • Software Engineer

Design a Payment Fraud Detection Service

Company: PayPal

Role: Software Engineer

Category: System Design

Difficulty: medium

Interview Round: Onsite

Design a real-time **fraud detection service** for a payment platform. When a user submits a payment attempt, the platform calls your service *before* authorizing or capturing the charge, and your service must return a decision: **allow**, **deny**, **challenge** (e.g. step-up authentication), or **send to manual review**. The service should reason over signals such as transaction amount, merchant, user account history, device fingerprint, IP address, geolocation, payment instrument, velocity signals (how often a card/device/account has been used recently), chargeback history, and known fraud patterns. It must combine **deterministic rules** (written and owned by risk analysts) with **machine-learning model scores**, do so under tight latency, and remain explainable, auditable, and highly available. Risk analysts must be able to author, test, and deploy new rules and models safely, and confirmed-fraud / chargeback / manual-review outcomes must flow back as labels to improve the system over time. ```hint Where to start Treat this as an online, synchronous scoring service sitting on the payment critical path. Separate the three things that must coexist: a **rules engine** (deterministic, analyst-owned), a **model serving** path (probabilistic score), and a **decision engine** that fuses both with business policy into one of the four actions. ``` ```hint Latency and the read path The expensive part is fetching fresh aggregates ("# attempts by this card in the last 10 min"). Pre-compute these with a stream processor into a low-latency **online feature store** so the request path is point lookups, not on-the-fly aggregation. Put strict timeouts on every dependency. ``` ```hint Closing the loop Fraud labels arrive *days to weeks* late (chargebacks, disputes). Design the offline feedback pipeline and a **feature snapshot** stored at decision time so you can train on exactly the features the model saw and explain any past decision. ``` ```hint Failure behavior Decide fail-open vs. fail-closed *per risk tier*, not globally — a degraded model or feature store should not block all payments, but it also should not wave through high-value suspicious ones. ``` ### Constraints & Assumptions - Online decisioning is on the payment critical path; target **p99 latency in the low hundreds of milliseconds** (e.g. < 100–200 ms) for the synchronous risk check. - High request volume (tens of thousands of payment attempts per second at peak for a large platform); assume bursty traffic. - **High availability** is required — fraud-service downtime stalls payment processing. - Fraud labels are **delayed and noisy** (chargebacks can land 30+ days after the transaction; not all fraud is ever labeled). - The service must be **explainable**: every deny / review must carry reason codes for customer support, compliance, and dispute handling. - Rule and model changes must roll out **safely** (no big-bang deploys to a live money-movement path). - Sensitive payment data (PAN, etc.) must be tokenized / minimized; treat PCI and privacy obligations as hard constraints. ### Clarifying Questions to Ask - What is the **call pattern** — is the risk check synchronous and blocking before authorization, or can some decisions be made asynchronously (e.g. post-auth holds)? - What are the **business priorities** for the decision tradeoff — minimize fraud loss, maximize approval/conversion, or a target chargeback rate? This sets the thresholds. - What is the expected **QPS and latency budget**, and is it uniform globally or regional? - Who **owns rules** and how fast must they ship (e.g. an analyst reacting to a live fraud attack in minutes)? - What **labels and feedback** are available (chargebacks, disputes, manual-review outcomes, confirmed fraud) and with what delay? - Are there **compliance / regulatory** requirements (PCI-DSS, sanctions screening, regional data residency) that constrain storage and the decision flow? - Is there an existing **feature platform / model serving** infra to reuse, or is this greenfield? ### What a Strong Answer Covers - **Requirements framing**: separates functional (decision, rules + model, analyst tooling, feedback ingestion, auditability) from non-functional (latency, availability, explainability, safe rollout, security) and ties decisions back to the business tradeoff (fraud loss vs. approval rate). - **Synchronous decision path**: a clean Risk API, an idempotent request/response contract with reason codes and rule/model versions, and a decision engine that fuses hard rules, the model score, and policy into allow / deny / challenge / review. - **Feature architecture**: an online feature store fed by a stream processor; the same feature definitions used offline for training to avoid **training–serving skew**; a per-decision feature snapshot. - **Rules engine**: versioning, approval workflow, dry-run / shadow, allow/block lists, and full audit of who changed what. - **Model lifecycle**: registry with versioned models + feature schema, shadow → canary rollout, a rules-only fallback, and monitoring for drift and feature freshness. - **Scale & availability**: handling QPS within the latency budget, multi-zone deployment, caching of static data, strict timeouts, circuit breakers, and an explicit **fail-open vs. fail-closed** policy by risk tier. - **Auditability & monitoring**: what is persisted per decision, plus the key health metrics (latency/error rate, decision distribution, chargeback / confirmed-fraud rate, false-positive rate from review, score drift). - **Security & privacy**: tokenization, encryption, role-based access, tamper-resistant logs, and regulatory alignment. ### Follow-up Questions - A new fraud attack pattern emerges that the model has never seen. How does an analyst respond **within minutes**, and how do you ensure the rule they ship doesn't accidentally block a large swath of legitimate traffic? - Chargeback labels arrive 30–60 days after the transaction. How does this label delay affect model retraining cadence and your ability to detect a sudden model-quality regression *quickly*? What faster proxy signals could you watch? - How do you measure whether a deny/challenge decision was *correct* given that you never observe the counterfactual outcome of transactions you blocked? What sampling or experimentation could break this feedback bias? - The model serving tier degrades and starts timing out under a traffic spike. Walk through exactly what your service returns for a $5 transaction vs. a $5,000 transaction during the outage, and why.

Quick Answer: This system design question tests the ability to architect a real-time, low-latency fraud decisioning service that combines deterministic rule engines with machine learning model scores. It evaluates practical understanding of distributed systems trade-offs, feature store design, and safe model/rule deployment on a high-availability critical path.

Related Interview Questions

  • Design a Cross-Border Money Transfer Service - PayPal (medium)
  • Design elevator scheduling for small building - PayPal (medium)
PayPal logo
PayPal
Apr 14, 2026, 12:00 AM
Software Engineer
Onsite
System Design
5
0

Design a real-time fraud detection service for a payment platform. When a user submits a payment attempt, the platform calls your service before authorizing or capturing the charge, and your service must return a decision: allow, deny, challenge (e.g. step-up authentication), or send to manual review.

The service should reason over signals such as transaction amount, merchant, user account history, device fingerprint, IP address, geolocation, payment instrument, velocity signals (how often a card/device/account has been used recently), chargeback history, and known fraud patterns. It must combine deterministic rules (written and owned by risk analysts) with machine-learning model scores, do so under tight latency, and remain explainable, auditable, and highly available. Risk analysts must be able to author, test, and deploy new rules and models safely, and confirmed-fraud / chargeback / manual-review outcomes must flow back as labels to improve the system over time.

Constraints & Assumptions

  • Online decisioning is on the payment critical path; target p99 latency in the low hundreds of milliseconds (e.g. < 100–200 ms) for the synchronous risk check.
  • High request volume (tens of thousands of payment attempts per second at peak for a large platform); assume bursty traffic.
  • High availability is required — fraud-service downtime stalls payment processing.
  • Fraud labels are delayed and noisy (chargebacks can land 30+ days after the transaction; not all fraud is ever labeled).
  • The service must be explainable : every deny / review must carry reason codes for customer support, compliance, and dispute handling.
  • Rule and model changes must roll out safely (no big-bang deploys to a live money-movement path).
  • Sensitive payment data (PAN, etc.) must be tokenized / minimized; treat PCI and privacy obligations as hard constraints.

Clarifying Questions to Ask

  • What is the call pattern — is the risk check synchronous and blocking before authorization, or can some decisions be made asynchronously (e.g. post-auth holds)?
  • What are the business priorities for the decision tradeoff — minimize fraud loss, maximize approval/conversion, or a target chargeback rate? This sets the thresholds.
  • What is the expected QPS and latency budget , and is it uniform globally or regional?
  • Who owns rules and how fast must they ship (e.g. an analyst reacting to a live fraud attack in minutes)?
  • What labels and feedback are available (chargebacks, disputes, manual-review outcomes, confirmed fraud) and with what delay?
  • Are there compliance / regulatory requirements (PCI-DSS, sanctions screening, regional data residency) that constrain storage and the decision flow?
  • Is there an existing feature platform / model serving infra to reuse, or is this greenfield?

What a Strong Answer Covers

  • Requirements framing : separates functional (decision, rules + model, analyst tooling, feedback ingestion, auditability) from non-functional (latency, availability, explainability, safe rollout, security) and ties decisions back to the business tradeoff (fraud loss vs. approval rate).
  • Synchronous decision path : a clean Risk API, an idempotent request/response contract with reason codes and rule/model versions, and a decision engine that fuses hard rules, the model score, and policy into allow / deny / challenge / review.
  • Feature architecture : an online feature store fed by a stream processor; the same feature definitions used offline for training to avoid training–serving skew ; a per-decision feature snapshot.
  • Rules engine : versioning, approval workflow, dry-run / shadow, allow/block lists, and full audit of who changed what.
  • Model lifecycle : registry with versioned models + feature schema, shadow → canary rollout, a rules-only fallback, and monitoring for drift and feature freshness.
  • Scale & availability : handling QPS within the latency budget, multi-zone deployment, caching of static data, strict timeouts, circuit breakers, and an explicit fail-open vs. fail-closed policy by risk tier.
  • Auditability & monitoring : what is persisted per decision, plus the key health metrics (latency/error rate, decision distribution, chargeback / confirmed-fraud rate, false-positive rate from review, score drift).
  • Security & privacy : tokenization, encryption, role-based access, tamper-resistant logs, and regulatory alignment.

Follow-up Questions

  • A new fraud attack pattern emerges that the model has never seen. How does an analyst respond within minutes , and how do you ensure the rule they ship doesn't accidentally block a large swath of legitimate traffic?
  • Chargeback labels arrive 30–60 days after the transaction. How does this label delay affect model retraining cadence and your ability to detect a sudden model-quality regression quickly ? What faster proxy signals could you watch?
  • How do you measure whether a deny/challenge decision was correct given that you never observe the counterfactual outcome of transactions you blocked? What sampling or experimentation could break this feedback bias?
  • The model serving tier degrades and starts timing out under a traffic spike. Walk through exactly what your service returns for a 5transactionvs.a5 transaction vs. a 5transactionvs.a 5,000 transaction during the outage, and why.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More PayPal•More Software Engineer•PayPal Software Engineer•PayPal System Design•Software Engineer System Design
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.