Scenario
Trust & Safety data science: You are asked to design metrics for two situations: (1) a content‑moderation A/B test where harmful‑content prevalence is low, and (2) evaluation of a customer‑service chatbot’s knowledge base.
Task
-
Content moderation A/B test (low prevalence): Which short‑term, user‑centric metrics would you track to detect impact quickly, and why? Describe how you would set up the experiment to ensure sensitivity and guardrails.
-
Chatbot knowledge base: How would you design an experiment and choose evaluation metrics to measure the quality and usefulness of the chatbot’s knowledge base? Cover both offline and online evaluation, and discuss trade‑offs.
Hints
-
Consider immediate user actions (e.g., report rates, dismissals, session exits, latency).
-
For chatbot, consider answer precision/recall, deflection/containment, CSAT, time to resolution.
-
Discuss experiment design (randomization unit, triggering, guardrails) and trade‑offs.