Design a Fraud Detection System for a Marketplace and Profile Credentials
Context
You are a data scientist at a two‑sided marketplace where users can post listings (goods/services) and maintain profiles that may include education credentials. You must design detection for two abuse types:
-
(a) Fake marketplace listings
-
(b) Fabricated education credentials on user profiles
Assume you have event logs, listing content (text/images), profile data, device/network telemetry, payment metadata, and graph data of interactions (messages, transactions, connections).
Tasks
-
Feature Engineering
-
Propose at least 10 concrete features spanning: unique identifiers (e.g., device ID, IP, payment instruments), text/image signals (e.g., low‑res, duplicate embeddings), behavioral patterns (e.g., burst account creation, connection request acceptance rates), and network features (e.g., triadic closure among claimed "alumni"). Include features tailored to both (a) listings and (b) credentials.
-
Modeling Approach
-
Combine supervised learning (to capture known patterns) with anomaly detection (to flag novel attacks). Specify algorithms for tabular, text, image, and graph data. Explain how you will handle class imbalance and noisy/delayed labels.
-
Thresholding with Costs
-
Choose an operating threshold using explicit costs of false negatives vs. false positives. Show how this choice maps onto ROC and PR curves, and provide a small numeric example.
-
Risk‑Based Step‑Up and Online Evaluation
-
Propose a risk‑based step‑up policy (e.g., <10% auto‑pass, 10–50% 2FA, >50% auto‑block/quarantine). Design an online evaluation plan that limits collateral damage to legitimate users while gathering the labels you need.
-
Monitoring, Retraining, and Red‑Team
-
Define concept‑drift monitoring (metrics and alerting), retraining cadence and promotion strategy, and a red‑team simulation program to uncover adaptive fraud.