Detect and suppress bad sellers robustly

Q: Detect and suppress bad sellers robustly

This is a Machine Learning interview question from TikTok for Data Scientist roles. View the full question and solution on PracHub.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Loading...

System Design: Identify and Suppress Bad Sellers in a Commerce Marketplace

Context

You are designing an ML-driven risk system for a large-scale marketplace with millions of buyers and sellers. The goal is to detect and suppress "bad sellers" (e.g., fraud, counterfeit, never-ship, review manipulation), minimize harm to buyers and the platform, and avoid unnecessary friction for legitimate sellers.

Task

Design the end-to-end system, from labels and features to modeling, evaluation, ranking integration, and human-in-the-loop operations.

Requirements

(a) Labels and triage

Define label types:
- Hard labels from confirmed abuse (e.g., chargeback adjudicated against seller, policy/banned decisions, law-enforcement confirmations).
- Soft labels from complaints, disputes, cancellations, refunds, and buyer reports.
Propose triage policies for actions (e.g., auto-takedown, velocity caps, payment holds, manual review) and describe how to de-noise soft labels.

(b) Feature families and leakage control

Enumerate feature families, including:
1. Graph/linkage: device/IP/payment/shipping overlaps, connected components, label propagation.
2. Temporal behavior: burstiness, inter-arrival times, cancellations/refunds, ship-late patterns.
3. Content/pricing anomalies: image/text similarity to known brands, price outliers.
4. Buyer feedback: ratings distribution, review text signals.
5. Evasion: account resets, editing sprees, re-registration patterns.
Identify leakage-prone fields and explain prevention via point-in-time (time-based) joins and seller-level splits.

(c) Modeling and training

Choose and justify a modeling approach (e.g., gradient boosting with engineered/graph features vs. GNN).
Specify a cost-sensitive scheme (class weights, focal loss, or custom loss) to reflect asymmetric costs and label noise.

(d) Evaluation

Define primary metric(s) (e.g., PR-AUC), calibration approach, and cost-based thresholding.
Include fairness slices (e.g., new sellers, categories, regions) and tests for stability under adversarial drift.

(e) Integration into ranking

Describe how to combine a risk score with a relevance score for search/feed/ads without creating feedback loops.

(f) Operations

Outline human-in-the-loop review, active learning focusing on hard negatives, drift detection/alerting, and safe rollback procedures.

Detect and suppress bad sellers robustly

System Design: Identify and Suppress Bad Sellers in a Commerce Marketplace

Context

Task

Requirements

Solution

Comments (0)