Explain Harassment Surge
Company: Airwallex
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: medium
Interview Round: Onsite
Suppose your monthly moderation analysis shows that `Harassment` increased sharply in the most recent month. Assume a post is classified as violating when `probability_violating > 0.5`, and the monthly distribution is based on distinct posts viewed in each month.
As a Data Scientist working on content integrity, answer the following:
1. What are the most plausible explanations for the apparent surge in `Harassment`? Consider both real changes in harmful behavior and measurement artifacts.
2. How would you investigate whether this is a true increase in harassment versus an artifact caused by traffic mix, recommendation changes, policy changes, model retraining, thresholding, or data quality issues?
3. What metrics, segmentations, and statistical checks would you use? Your answer should explicitly address denominator effects, Simpson's paradox, selection bias, label drift, and model calibration.
4. If the increase is real, what product, policy, operations, or modeling interventions would you propose?
5. If you wanted to test a mitigation, how would you design an experiment or quasi-experiment, and what primary metrics and guardrail metrics would you use?
Quick Answer: This question evaluates a Data Scientist's competency in diagnostic analytics, statistical reasoning, causal inference, model evaluation (including label drift and calibration), and experiment design for content integrity and moderation monitoring.