Design bot detection and evaluate trade-offs

Q: Design bot detection and evaluate trade-offs

This question evaluates a data scientist's competency in designing operational machine-learning systems for bot detection, covering labeling and weak supervision, temporal and graph-based feature engineering, model choice and calibration, cost-based thresholding, adversarial robustness, and online monitoring and safety nets.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Bot-Detection System Design for Comment Activity

Context

You are designing and evaluating a machine learning system to detect automated (bot) comment activity on a large-scale social platform. Bots are rare (e.g., <0.5% of comments) and adversarial. Your solution should balance safety (blocking bots) and user experience (minimizing false positives on humans), and it must work both offline and online.

Tasks

Labeling strategy with minimal ground truth

Propose weak-supervision heuristics.
Define a manual review sampling plan.
Explain how to de-bias labels given extreme class imbalance (<0.5% bots).

Features across time scales

Specify features: session burstiness, inter-comment intervals, entropy/diversity of targets, language signals, graph-based reciprocity, account/device/network signals.
Indicate which features must be real-time vs. batch.

Model choice and calibration

Compare linear models, tree ensembles, and sequence models.
Describe how to calibrate posteriors (Platt scaling, Isotonic) and how to monitor calibration drift.

Thresholding by cost

Define costs for FP (blocking a human) vs FN (missing a bot).
Choose an operating point using precision–recall curves.
Compute expected blocked-human-minutes at a chosen threshold given example rates.

Adversarial robustness

Identify features least gameable, propose canaries, and drift detection.

Online safety net

Outline shadow mode, backfill re-scoring, and human review queues.

Evaluation

Offline: PR-AUC, recall at high precision, slice analysis.
Online guardrails: abuse reports, creator retention, comment latency.

If FP becomes high

Trace root cause with error analysis and propose a rollback/ramp strategy.

Design bot detection and evaluate trade-offs

Bot-Detection System Design for Comment Activity

Context

Tasks

Solution

Comments (0)

Design bot detection and evaluate trade-offs

Overview

Bot-Detection System Design for Comment Activity

Context

Tasks

Solution

Comments (0)