Design fraud detection across channels with unknowns

Q: Design fraud detection across channels with unknowns

This question evaluates a data scientist's competence in designing and operationalizing multi-channel fraud detection systems, covering cost-sensitive objective formulation, segmentation and feature engineering, model selection (including sequence and graph approaches), anomaly detection for unknown actors, drift monitoring, and safe LLM-assisted workflows. Commonly asked in the Machine Learning domain to assess end-to-end systems thinking and trade-off reasoning between customer experience and fraud loss, it tests both conceptual understanding and practical application-level skills such as evaluation, deployment guardrails, and handling label sparsity and delay.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Q: What difficulty level is this interview question?

This is a hard difficulty Machine Learning question, commonly asked during Onsite rounds at Amazon.

Q: What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Amazon during technical interviews.

Question

Loading...

Fraud Detection Strategy for a Multi‑Channel Marketplace

Context: You are designing a fraud detection system for a large marketplace operating across web, mobile app, and in‑store channels. Attacker behavior evolves, labels are sparse and delayed (e.g., chargebacks), and decisions must balance customer experience against fraud loss.

Tasks

Problem Framing
- Define precise objectives for real-time screening (approve/review/block) versus offline investigation (case triage, ring discovery).
- Translate business costs (false positive customer friction, false negative loss, review costs) into a cost‑sensitive objective or decision rule.
- Justify the use of AUCPR vs. ROC, expected cost, or a custom utility for model selection.
Data Strategy
- Propose segmentation (e.g., by channel, geography, new vs. returning users, device fingerprint, payment method).
- Enumerate feature families (behavioral sequences, velocity features, device/network, payment, graph/identity) and how to augment with third‑party/consortium data.
- Address cold‑start (new geos, new devices) and label scarcity/delay.
Modeling Approach
- Compare baseline rules + gradient‑boosted trees with deep models (sequence/graph).
- When to train a single global model with channel features vs. per‑channel models; discuss multi‑task alternatives.
- How to incorporate graph features/embeddings; specify regularization and class‑imbalance handling.
Unknown Bad Actors
- Design a pipeline to discover emerging patterns: unsupervised/anomaly detection or contrastive self‑supervision to surface clusters/signals, human labeling to curate exemplars, then supervised fine‑tuning.
- How to prevent feedback loops and label bias.
Evaluation
- Define offline metrics (AUCPR, cost curves, calibration) and online guardrails (customer friction rate, review queue load).
- Design a temporal holdout/split to avoid leakage; quantify expected dollar impact under several thresholds.
Robustness & Drift
- Describe drift detection, shadow deployment, threshold adaptation, and rollback.
- Identify leading indicators to monitor daily.
Second‑Chance Improvements
- If you were to redo a past model, what concrete changes (features, objective, sampling, thresholding, data contracts) would you make and why?
LLM Leverage
- Propose a safe way to use LLMs for analyst triage or rule suggestion (prompt templates, retrieval grounding, safety filters), and how to A/B test the workflow impact without exposing PII.

Design fraud detection across channels with unknowns

Quick Overview

Fraud Detection Strategy for a Multi‑Channel Marketplace

Tasks

Solution

Comments (0)