Investigate Harassment Surge and Mitigation
Company: Airwallex
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: medium
Interview Round: Onsite
Using the same moderation setting, suppose your monthly violation analysis shows that `Harassment` increased sharply in the most recent month.
Discuss the following:
1. What are the most plausible explanations for the surge?
Consider both real-world and measurement-related causes, such as:
- a true increase in abusive behavior
- traffic mix changes across surfaces, regions, languages, or creator cohorts
- seasonality or external events
- coordinated attacks or repeat offenders
- policy-definition changes
- model-threshold changes or model-version changes
- calibration drift in the classifier
- data-quality, logging, or backfill issues
2. How would you investigate whether the surge is real versus an artifact?
Be specific about:
- which prevalence metrics you would use
- the denominators and time windows you would compare
- which segments you would break the data into
- what additional datasets you would request, such as human-review labels, user reports, enforcement logs, or model-version metadata
- how you would account for confounding, selection bias, and Simpson's paradox
3. If the surge is real, what product, policy, ranking, operational, and ML solutions would you propose?
Include both short-term containment actions and longer-term fixes.
4. How would you evaluate those solutions?
Include primary metrics, guardrail metrics, experiment or rollout design, and the main tradeoffs involving false positives, fairness, and user experience.
Your answer should distinguish a volume increase from a rate increase and should explain how to validate model-driven metrics using human-reviewed labels.
Quick Answer: This question evaluates a data scientist's skills in diagnostic analytics, causal reasoning, validation of model-driven metrics using human-reviewed labels, and designing product/ML mitigations within the Analytics & Experimentation domain.