PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Machine Learning/TikTok

Detect and suppress bad sellers robustly

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's competence in designing end-to-end machine learning risk systems, including label strategy and triage, feature engineering and leakage control, modeling with cost-sensitive losses, evaluation and calibration, ranking integration, and human-in-the-loop operations.

  • hard
  • TikTok
  • Machine Learning
  • Data Scientist

Detect and suppress bad sellers robustly

Company: TikTok

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Technical Screen

Design a system to identify and suppress bad sellers. (a) Propose label definitions and triage policies (hard labels from confirmed abuse, soft labels from complaints/chargebacks) and how to de-noise them. (b) Enumerate feature families including graph/linkage (device/IP/payment overlap), temporal behavior (burstiness, cancellations, ship-late), content/pricing anomalies, buyer feedback, and evasion signals; specify leakage-prone fields and how to prevent it with time-based joins and seller-level splits. (c) Choose and justify a modeling approach (e.g., gradient boosting + graph features vs. GNN) and a cost-sensitive training scheme (class weights, focal loss, or custom loss). (d) Define evaluation: primary metric (PR-AUC), calibration, cost-based thresholding, fairness slices (new sellers, categories, regions), and stability under adversarial drift. (e) Integrate into ranking: how do you combine a risk score with a relevance score without creating feedback loops? (f) Outline human-in-the-loop review, active learning for hard negatives, drift detection, and safe rollback.

Quick Answer: This question evaluates a candidate's competence in designing end-to-end machine learning risk systems, including label strategy and triage, feature engineering and leakage control, modeling with cost-sensitive losses, evaluation and calibration, ranking integration, and human-in-the-loop operations.

Related Interview Questions

  • Design multimodal deployment under compute limits - TikTok (easy)
  • Explain overfitting, dropout, normalization, RL post-training - TikTok (medium)
  • Write self-attention and cross-entropy pseudocode - TikTok (medium)
  • Explain FlashAttention, KV cache, and RoPE - TikTok (medium)
  • Answer ML fundamentals and diagnostics questions - TikTok (hard)
TikTok logo
TikTok
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Machine Learning
1
0
Loading...

System Design: Identify and Suppress Bad Sellers in a Commerce Marketplace

Context

You are designing an ML-driven risk system for a large-scale marketplace with millions of buyers and sellers. The goal is to detect and suppress "bad sellers" (e.g., fraud, counterfeit, never-ship, review manipulation), minimize harm to buyers and the platform, and avoid unnecessary friction for legitimate sellers.

Task

Design the end-to-end system, from labels and features to modeling, evaluation, ranking integration, and human-in-the-loop operations.

Requirements

(a) Labels and triage

  • Define label types:
    • Hard labels from confirmed abuse (e.g., chargeback adjudicated against seller, policy/banned decisions, law-enforcement confirmations).
    • Soft labels from complaints, disputes, cancellations, refunds, and buyer reports.
  • Propose triage policies for actions (e.g., auto-takedown, velocity caps, payment holds, manual review) and describe how to de-noise soft labels.

(b) Feature families and leakage control

  • Enumerate feature families, including:
    1. Graph/linkage: device/IP/payment/shipping overlaps, connected components, label propagation.
    2. Temporal behavior: burstiness, inter-arrival times, cancellations/refunds, ship-late patterns.
    3. Content/pricing anomalies: image/text similarity to known brands, price outliers.
    4. Buyer feedback: ratings distribution, review text signals.
    5. Evasion: account resets, editing sprees, re-registration patterns.
  • Identify leakage-prone fields and explain prevention via point-in-time (time-based) joins and seller-level splits.

(c) Modeling and training

  • Choose and justify a modeling approach (e.g., gradient boosting with engineered/graph features vs. GNN).
  • Specify a cost-sensitive scheme (class weights, focal loss, or custom loss) to reflect asymmetric costs and label noise.

(d) Evaluation

  • Define primary metric(s) (e.g., PR-AUC), calibration approach, and cost-based thresholding.
  • Include fairness slices (e.g., new sellers, categories, regions) and tests for stability under adversarial drift.

(e) Integration into ranking

  • Describe how to combine a risk score with a relevance score for search/feed/ads without creating feedback loops.

(f) Operations

  • Outline human-in-the-loop review, active learning focusing on hard negatives, drift detection/alerting, and safe rollback procedures.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More TikTok•More Data Scientist•TikTok Data Scientist•TikTok Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.