Design Harmful Content Detection
Company: Databricks
Role: Machine Learning Engineer
Category: ML System Design
Difficulty: medium
Interview Round: Onsite
Design an end-to-end machine learning system to detect harmful user-generated content on a large online platform. Assume the platform accepts text and images, processes millions of submissions per day, and needs both low-latency online decisions and higher-quality offline review.
Your design should cover:
- content taxonomy such as hate speech, threats, sexual content, self-harm, violent content, and spam,
- model inputs and labeling strategy,
- online inference and moderation workflows,
- confidence thresholds and human review,
- evaluation metrics,
- monitoring, drift detection, and abuse resistance.
Quick Answer: This question evaluates a candidate's ability to design scalable, robust machine learning systems for multimodal content moderation, encompassing competencies in system architecture, data labeling and governance, model evaluation, online inference, and monitoring.