Fraud and Bot Detection Systems
Asked of: Data Scientist
Last updated

-
What it is Systems that flag or block fraudulent users, payments, and automated accounts by combining rules, machine learning, and signals like behavior, devices, networks, and graphs. They run continuously, retrain as attackers adapt, and take actions ranging from CAPTCHAs to bans and payment holds.
-
Why interviewers ask about it Integrity and trust drive growth at consumer platforms; losses from chargebacks, spam, account takeovers, and scraping can be huge. Data scientists are expected to design high-precision, low-latency models, choose business-aware metrics, and build feedback loops that stand up to adversarial behavior at scale.
-
Core ideas to know
- Labels are delayed/noisy (e.g., chargebacks, manual bans); use weak supervision, PU learning, and human-in-the-loop review.
- Class imbalance is extreme; prefer cost-sensitive learning, calibrated thresholds, and precision-at-K over raw AUC.
- Concept drift is constant; monitor population/feature drift and retrain with replay or sliding windows.
- Real-time constraints matter: feature stores, join latency, and decision budgets (e.g., <50 ms at edge).
- Graph-based features catch collusion: shared devices, IP subnets, payments, referrals, and temporal motifs.
- Hybrid rules+ML: rules for obvious abuse and safety, models for generalization; explainability for policy appeal.
- Action policy design: challenges, rate limits, shadow bans, and staged enforcement to avoid false-positive blowups.
-
A common pitfall Candidates optimize offline AUC on a static dataset and ignore business costs, latency, and attacker adaptation. Interviewers want how you’d pick thresholds for different actions, backtest on holdout days, and monitor post-launch drift and alert fatigue. They also expect guardrails: canaries, rate limits, and manual review for high-impact actions. Skipping these operational details suggests the solution won’t survive production.
-
Further reading
- Uber Engineering — Fraud Detection: Using Relational Graph Learning to Detect Collusion. Practical graph features and production lessons for catching coordinated abuse. https://www.uber.com/blog/fraud-detection/
- Cloudflare Developers — Machine Learning models for Bot Management. Concrete signals, continuous updates, and real-time classification at the edge for bot detection. https://developers.cloudflare.com/bots/reference/machine-learning-models/
- Computers (MDPI, 2024) — Systematic Review of ML in Credit Card Fraud Detection Under Original Class Imbalance. Up-to-date survey on imbalance, metrics, and evaluation pitfalls. https://www.mdpi.com/2073-431X/14/10/437
Related concepts
- Fraud, Bot, And Fake Account Detection
- Fake Account, Bot, And Fraud MeasurementAnalytics & Experimentation
- Integrity, Fraud, Bot, And Harmful Content Measurement
- Fraud Risk Modeling And Real-Time DecisioningML System Design
- Platform Integrity: Fake Accounts, Bots, Fraud, And Harmful ContentAnalytics & Experimentation
- Recommender, Ranking, And Ads Systems