PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/Machine Learning/PayPal

Explain unsupervised fraud and evaluation

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a data scientist's competency in unsupervised anomaly detection methods, evaluation under label scarcity, and operational decision-making for fraud detection on highly imbalanced data.

  • hard
  • PayPal
  • Machine Learning
  • Data Scientist

Explain unsupervised fraud and evaluation

Company: PayPal

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Onsite

Explain unsupervised approaches for fraud detection and when you would use them versus supervised methods. Compare options such as clustering, density estimation, isolation forests, autoencoders, and graph anomalies. Then discuss how to evaluate without reliable labels: use precision@k, recall at a fixed review budget, PR-AUC vs ROC-AUC under extreme imbalance, rank-based metrics, proxy/delayed labels, and calibration checks. Clarify why raw “accuracy” is misleading here and how you would choose thresholds.

Quick Answer: This question evaluates a data scientist's competency in unsupervised anomaly detection methods, evaluation under label scarcity, and operational decision-making for fraud detection on highly imbalanced data.

Related Interview Questions

  • How to validate production models? - PayPal (medium)
  • Explain fraud types and evaluate a fraud model - PayPal (hard)
  • Build a real-time ATO model - PayPal (hard)
  • Assess LLMs for fraud detection - PayPal (hard)
  • Identify Unsupervised Techniques for Detecting Fraudulent Transactions - PayPal (medium)
PayPal logo
PayPal
Jul 31, 2025, 12:00 AM
Data Scientist
Onsite
Machine Learning
2
0

Unsupervised Fraud Detection: Methods, When to Use Them, and How to Evaluate Without Reliable Labels

Context

You are designing fraud detection for a large payments platform. Fraud is rare and evolving, labels (e.g., chargebacks) are delayed or incomplete, and you have a limited manual review budget. You need to:

  1. Explain when you would use unsupervised approaches versus supervised methods.
  2. Compare common unsupervised options: clustering, density estimation, Isolation Forests, autoencoders, and graph-based anomaly detection.
  3. Describe how to evaluate models without reliable labels, including:
    • Precision@k, recall at a fixed review budget, PR-AUC vs ROC-AUC under extreme imbalance, and other rank-based metrics.
    • Using proxy/delayed labels and calibration checks.
  4. Clarify why raw accuracy is misleading for this problem and how to choose thresholds under operational constraints.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More PayPal•More Data Scientist•PayPal Data Scientist•PayPal Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.