Detect and suppress bad sellers robustly
Company: TikTok
Role: Data Scientist
Category: Machine Learning
Difficulty: hard
Interview Round: Technical Screen
Design a system to identify and suppress bad sellers. (a) Propose label definitions and triage policies (hard labels from confirmed abuse, soft labels from complaints/chargebacks) and how to de-noise them. (b) Enumerate feature families including graph/linkage (device/IP/payment overlap), temporal behavior (burstiness, cancellations, ship-late), content/pricing anomalies, buyer feedback, and evasion signals; specify leakage-prone fields and how to prevent it with time-based joins and seller-level splits. (c) Choose and justify a modeling approach (e.g., gradient boosting + graph features vs. GNN) and a cost-sensitive training scheme (class weights, focal loss, or custom loss). (d) Define evaluation: primary metric (PR-AUC), calibration, cost-based thresholding, fairness slices (new sellers, categories, regions), and stability under adversarial drift. (e) Integrate into ranking: how do you combine a risk score with a relevance score without creating feedback loops? (f) Outline human-in-the-loop review, active learning for hard negatives, drift detection, and safe rollback.
Quick Answer: This question evaluates a candidate's competence in designing end-to-end machine learning risk systems, including label strategy and triage, feature engineering and leakage control, modeling with cost-sensitive losses, evaluation and calibration, ranking integration, and human-in-the-loop operations.