System Design: Identify and Suppress Bad Sellers in a Commerce Marketplace
Context
You are designing an ML-driven risk system for a large-scale marketplace with millions of buyers and sellers. The goal is to detect and suppress "bad sellers" (e.g., fraud, counterfeit, never-ship, review manipulation), minimize harm to buyers and the platform, and avoid unnecessary friction for legitimate sellers.
Task
Design the end-to-end system, from labels and features to modeling, evaluation, ranking integration, and human-in-the-loop operations.
Requirements
(a) Labels and triage
-
Define label types:
-
Hard labels from confirmed abuse (e.g., chargeback adjudicated against seller, policy/banned decisions, law-enforcement confirmations).
-
Soft labels from complaints, disputes, cancellations, refunds, and buyer reports.
-
Propose triage policies for actions (e.g., auto-takedown, velocity caps, payment holds, manual review) and describe how to de-noise soft labels.
(b) Feature families and leakage control
-
Enumerate feature families, including:
-
Graph/linkage: device/IP/payment/shipping overlaps, connected components, label propagation.
-
Temporal behavior: burstiness, inter-arrival times, cancellations/refunds, ship-late patterns.
-
Content/pricing anomalies: image/text similarity to known brands, price outliers.
-
Buyer feedback: ratings distribution, review text signals.
-
Evasion: account resets, editing sprees, re-registration patterns.
-
Identify leakage-prone fields and explain prevention via point-in-time (time-based) joins and seller-level splits.
(c) Modeling and training
-
Choose and justify a modeling approach (e.g., gradient boosting with engineered/graph features vs. GNN).
-
Specify a cost-sensitive scheme (class weights, focal loss, or custom loss) to reflect asymmetric costs and label noise.
(d) Evaluation
-
Define primary metric(s) (e.g., PR-AUC), calibration approach, and cost-based thresholding.
-
Include fairness slices (e.g., new sellers, categories, regions) and tests for stability under adversarial drift.
(e) Integration into ranking
-
Describe how to combine a risk score with a relevance score for search/feed/ads without creating feedback loops.
(f) Operations
-
Outline human-in-the-loop review, active learning focusing on hard negatives, drift detection/alerting, and safe rollback procedures.