Design Harmful Content and OOM Detection
Company: Databricks
Role: Machine Learning Engineer
Category: ML System Design
Difficulty: medium
Interview Round: Onsite
Design two machine learning systems:
1. **Harmful content detection for LLM applications**: Build a system that detects harmful user inputs or model outputs, such as hate speech, self-harm, sexual content, harassment, or dangerous instructions. Describe the problem definition, label taxonomy, data collection, modeling approach, online serving flow, policy decisions, evaluation metrics, and how you would handle adversarial behavior and concept drift.
2. **Out-of-memory prediction for notebook sessions**: For an interactive notebook platform similar to Colab, design a system that predicts whether a running notebook session is likely to hit an out-of-memory failure soon. Specify the prediction target, available signals, feature design, labeling strategy, modeling approach, serving architecture, interventions after prediction, and offline and online success metrics.
Quick Answer: This question evaluates machine learning system-design competencies, including safety-oriented content classification, taxonomy and labeling strategy, data collection, feature engineering, real-time prediction and serving architecture, monitoring for concept drift and adversarial behavior, and operational metrics for reliability.