PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Amazon

Design fraud detection across channels with unknowns

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a data scientist's competence in designing and operationalizing multi-channel fraud detection systems, covering cost-sensitive objective formulation, segmentation and feature engineering, model selection (including sequence and graph approaches), anomaly detection for unknown actors, drift monitoring, and safe LLM-assisted workflows. Commonly asked in the Machine Learning domain to assess end-to-end systems thinking and trade-off reasoning between customer experience and fraud loss, it tests both conceptual understanding and practical application-level skills such as evaluation, deployment guardrails, and handling label sparsity and delay.

  • hard
  • Amazon
  • Machine Learning
  • Data Scientist

Design fraud detection across channels with unknowns

Company: Amazon

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Onsite

A marketplace sees fraud across multiple channels (web, app, in-store) with evolving attacker behavior and sparse labels. 1) Problem framing: Define precise objectives for real-time screening vs. offline investigation. Translate business costs (false positive customer friction, false negative loss) into a cost-sensitive objective or decision rule; justify AUCPR vs. ROC, expected cost, or custom utility. 2) Data strategy: Propose segmentation (by channel, geography, new/returning users, device fingerprint), feature families (behavioral sequences, velocity features, device/network, payment, graph), and how you would augment with third-party or consortium data. Address cold-start and label scarcity. 3) Modeling approach: Compare baseline rules + gradient-boosted trees vs. deep models. When would you train a global model with channel features vs. per-channel models? How would you incorporate graph features or embeddings? Specify regularization and class-imbalance handling. 4) Unknown bad actors: Detail a pipeline that discovers emerging fraud patterns: unsupervised/anomaly detection or contrastive self-supervision to surface clusters/signals, human labeling to curate exemplars, then supervised fine-tuning. How do you prevent feedback loops and label bias? 5) Evaluation: Define offline metrics (AUCPR, cost curves, calibration) and online guardrails (customer friction rate, review queue load). Design a holdout/temporal split to avoid leakage; quantify expected dollar impact under several thresholds. 6) Robustness & drift: Describe drift detection, shadow deployment, threshold adaptation, and rollback. What leading indicators would you monitor daily? 7) Second-chance improvements: If you were to redo a past model, what concrete changes (features, objective, sampling, thresholding, data contracts) would you make and why? 8) LLM leverage: Propose a safe way to use LLMs for analyst triage or rule suggestion (prompt templates, retrieval grounding, safety filters), and how you would A/B test the workflow impact without exposing PII.

Quick Answer: This question evaluates a data scientist's competence in designing and operationalizing multi-channel fraud detection systems, covering cost-sensitive objective formulation, segmentation and feature engineering, model selection (including sequence and graph approaches), anomaly detection for unknown actors, drift monitoring, and safe LLM-assisted workflows. Commonly asked in the Machine Learning domain to assess end-to-end systems thinking and trade-off reasoning between customer experience and fraud loss, it tests both conceptual understanding and practical application-level skills such as evaluation, deployment guardrails, and handling label sparsity and delay.

Related Interview Questions

  • Predicting the Next Elevator Call Location - Amazon (medium)
  • Explain Transformer and MoE Fundamentals - Amazon (medium)
  • Explain Core ML Interview Concepts - Amazon (hard)
  • Evaluate NLP Classification Models - Amazon (easy)
  • Explain overfitting, regularization, and LLM techniques - Amazon (medium)
Amazon logo
Amazon
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Machine Learning
3
0
Loading...

Fraud Detection Strategy for a Multi‑Channel Marketplace

Context: You are designing a fraud detection system for a large marketplace operating across web, mobile app, and in‑store channels. Attacker behavior evolves, labels are sparse and delayed (e.g., chargebacks), and decisions must balance customer experience against fraud loss.

Tasks

  1. Problem Framing
    • Define precise objectives for real-time screening (approve/review/block) versus offline investigation (case triage, ring discovery).
    • Translate business costs (false positive customer friction, false negative loss, review costs) into a cost‑sensitive objective or decision rule.
    • Justify the use of AUCPR vs. ROC, expected cost, or a custom utility for model selection.
  2. Data Strategy
    • Propose segmentation (e.g., by channel, geography, new vs. returning users, device fingerprint, payment method).
    • Enumerate feature families (behavioral sequences, velocity features, device/network, payment, graph/identity) and how to augment with third‑party/consortium data.
    • Address cold‑start (new geos, new devices) and label scarcity/delay.
  3. Modeling Approach
    • Compare baseline rules + gradient‑boosted trees with deep models (sequence/graph).
    • When to train a single global model with channel features vs. per‑channel models; discuss multi‑task alternatives.
    • How to incorporate graph features/embeddings; specify regularization and class‑imbalance handling.
  4. Unknown Bad Actors
    • Design a pipeline to discover emerging patterns: unsupervised/anomaly detection or contrastive self‑supervision to surface clusters/signals, human labeling to curate exemplars, then supervised fine‑tuning.
    • How to prevent feedback loops and label bias.
  5. Evaluation
    • Define offline metrics (AUCPR, cost curves, calibration) and online guardrails (customer friction rate, review queue load).
    • Design a temporal holdout/split to avoid leakage; quantify expected dollar impact under several thresholds.
  6. Robustness & Drift
    • Describe drift detection, shadow deployment, threshold adaptation, and rollback.
    • Identify leading indicators to monitor daily.
  7. Second‑Chance Improvements
    • If you were to redo a past model, what concrete changes (features, objective, sampling, thresholding, data contracts) would you make and why?
  8. LLM Leverage
    • Propose a safe way to use LLMs for analyst triage or rule suggestion (prompt templates, retrieval grounding, safety filters), and how to A/B test the workflow impact without exposing PII.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Amazon•More Data Scientist•Amazon Data Scientist•Amazon Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.