{"blocks": [{"key": "64555de3", "text": "Scenario", "type": "header-two", "depth": 0, "inlineStyleRanges": [], "entityRanges": [], "data": {}}, {"key": "4d8fb60e", "text": "A new machine-learning model flags harmful posts; leadership wants evidence it outperforms the old system.", "type": "unstyled", "depth": 0, "inlineStyleRanges": [], "entityRanges": [], "data": {}}, {"key": "d81510d3", "text": "Question", "type": "header-two", "depth": 0, "inlineStyleRanges": [], "entityRanges": [], "data": {}}, {"key": "8413f80d", "text": "How would you evaluate the performance of the new harmful-content detection model versus the existing model or no model? Describe both offline evaluation (confusion matrix metrics) and online A/B testing approaches, addressing precision-recall trade-offs.", "type": "unstyled", "depth": 0, "inlineStyleRanges": [], "entityRanges": [], "data": {}}, {"key": "848ce842", "text": "Hints", "type": "header-two", "depth": 0, "inlineStyleRanges": [], "entityRanges": [], "data": {}}, {"key": "c08a89f1", "text": "Mention metrics (precision, recall, F1, ROC), calibration, business KPIs, guardrails, and experiment design.", "type": "unstyled", "depth": 0, "inlineStyleRanges": [], "entityRanges": [], "data": {}}], "entityMap": {}}