PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/ML System Design/Snapchat

Design a harmful content detection system

Last updated: Jun 24, 2026

Quick Overview

This question evaluates proficiency in ML system design for harmful-content detection, assessing competencies in multimodal modeling, taxonomy and labeling strategies, data strategy, privacy-aware architecture, real-time inference, human-in-the-loop decisioning, and operational reliability for global consumer platforms.

  • hard
  • Snapchat
  • ML System Design
  • Machine Learning Engineer

Design a harmful content detection system

Company: Snapchat

Role: Machine Learning Engineer

Category: ML System Design

Difficulty: hard

Interview Round: Onsite

Design an end-to-end harmful content detection system. Define the taxonomy (e.g., hate, self-harm, sexual content, violence), labeling guidelines and quality controls, and multilingual/multimodal scope (text, image, audio, video). Propose model choices (keyword baselines, classical ML, transformers, multimodal encoders) and training data strategy (collection, active learning, long-tail sampling, debiasing). Specify inference architecture (streaming vs. batch), thresholds and severity tiers, human-in-the-loop review, appeals/override flows, and explainability requirements. Address adversarial behavior (evasion, prompt injection), privacy and safety constraints, fairness and error costs (precision/recall trade-offs by class and region), monitoring and drift detection, A/B rollout, and feedback loops for continuous improvement.

Quick Answer: This question evaluates proficiency in ML system design for harmful-content detection, assessing competencies in multimodal modeling, taxonomy and labeling strategies, data strategy, privacy-aware architecture, real-time inference, human-in-the-loop decisioning, and operational reliability for global consumer platforms.

Related Interview Questions

  • Design a Family-Friendly Listing Classifier - Snapchat (medium)
  • Design User Embedding Semantic Search - Snapchat (medium)
  • Design a video recommendation system - Snapchat (medium)
  • Design an ads ranking ML system - Snapchat (medium)
  • Design short-video retrieval with sparse text - Snapchat (medium)
|Home/ML System Design/Snapchat

Design a harmful content detection system

Snapchat logo
Snapchat
Sep 6, 2025, 12:00 AM
hardMachine Learning EngineerOnsiteML System Design
8
0

Design a Harmful Content Detection System (Multilingual, Multimodal)

Problem Statement

You are designing a trust-and-safety system for a large, mobile-first, ephemeral consumer social platform with a global user base that includes teens. Users share multimodal content across many surfaces: 1:1 and group chat, short-form video (spotlight/stories), AR lenses/effects, and live audio/video. The platform must detect and act on harmful content in near real-time while respecting privacy, regional policies, and user trust.

Design an end-to-end harmful content detection system. The problem is broken into four parts below; address each part, then deliver a cohesive design that integrates the components into a single operational system and explains your key assumptions and trade-offs.

Constraints & Assumptions

State your own where the prompt is silent, but anchor to these:

  • Scale: tens of thousands of pieces of content per second at peak, across multiple modalities; hundreds of millions of users globally.
  • Latency by surface: chat and uploads should be moderated before delivery (interactive, sub-second budgets); live streams are moderated continuously over a sliding window; some content can be re-scanned in batch.
  • Modalities: text (including OCR text-in-image/video), image, audio (ASR transcripts + acoustic cues), and video (visual frames + motion), plus metadata/context (surface, locale, age band, account history).
  • Ephemerality & privacy: content is short-lived by design; data minimization, encryption, regional legal constraints, and age-appropriate design for minors are hard requirements.
  • Severity is not binary: harmful categories range from "reduce reach" to "block + escalate to law enforcement" (e.g., CSAM), so the system must produce graded, policy-driven actions, not a single block/allow bit.

Clarifying Questions to Ask

A candidate should scope the whole problem before diving in. Good questions include:

  • Which harm categories are in scope and what are the legal/zero-tolerance ones (e.g., CSAM, terrorism) versus policy-discretion ones (e.g., bullying)?
  • What are the per-surface latency and action budgets — must we block chat/uploads pre-delivery, or is post-hoc takedown acceptable for some surfaces?
  • What is the relative cost of false positives vs. false negatives per category, and does it differ by region or by minor vs. adult accounts?
  • What languages and regions must we support at launch, including low-resource languages and locale-specific policy?
  • What labeled data, enforcement logs, and industry hash-sharing programs (e.g., NCMEC/GIFCT) do we have access to, and what are the retention/access constraints on training data?
  • Is there an on-device budget (model size, battery, NPU) for pre-upload checks, and what privacy guarantees must on-device inference satisfy?

Part 1 — Taxonomy, Labeling & Multimodal Scope

Define a practical, multi-label taxonomy of harm categories (e.g., hate/harassment, self-harm/suicide, sexual content, violence/gore, illegal/regulated) with subcategories and severity tiers. Specify the labeling guidelines and quality controls for annotation, and define the multilingual and multimodal scope plus context rules (multi-turn chat, text-in-image, satire/condemnation/educational context).

What This Part Should Cover

  • A coherent, hierarchical multi-label taxonomy with subcategories AND ordinal severity tiers (not a flat list of classes).
  • Concrete annotation quality controls : gold seeding, IAA targets, adjudication, SME tiers for the most sensitive classes.
  • Multimodal coverage mapped to the taxonomy (text/OCR, image, audio/ASR, video) and explicit context rules for multi-turn and intent.
  • Awareness of sensitive-class handling (e.g., never collecting/annotating real CSAM directly; using hashes/redacted signals).

Part 2 — Modeling & Training Data Strategy

Propose a modeling stack spanning keyword/heuristic baselines, classical ML, modern transformers, and multimodal encoders, and justify which tool fits which job. Then describe the training-data strategy: data collection, active learning, long-tail sampling, debiasing, and handling sensitive classes.

What This Part Should Cover

  • A justified progression from baselines to advanced models, with the right model per modality and a clear cascade/escalation rationale (cost vs. accuracy).
  • Score calibration (e.g., Platt/Isotonic per class & locale) and out-of-distribution / low-confidence routing.
  • A realistic data pipeline : sources (enforcement logs, public datasets, hash-sharing, synthetic), active learning, long-tail oversampling, hard-negative mining.
  • Concrete debiasing and sensitive-class practices, including how CSAM/terror data is handled without direct collection.

Part 3 — Inference Architecture & Decisioning

Specify the inference architecture (streaming vs. batch, on-device vs. server, latency targets by surface), the decisioning layer (thresholds, severity tiers → policy actions such as block, age-gate, interstitial, downrank, quarantine), the human-in-the-loop flows (triage queues, escalation, appeals/overrides), and the explainability requirements for moderators and for user-facing transparency.

What This Part Should Cover

  • A concrete serving topology (per-modality microservices + fusion, autoscaling, model registry/versioning, canary) with per-surface latency budgets and streaming-vs-batch placement.
  • A decision engine that maps calibrated scores + severity + context to graded actions, with cost-sensitive (not arbitrary) thresholds.
  • A complete human-in-the-loop design: triage/escalation, crisis routing for self-harm, appeals/overrides, audit logging.
  • Explainability for both moderators (evidence, saliency, exemplars) and users (reason codes without revealing exact rules).

Part 4 — Risk, Privacy, Fairness & Reliability

Address the operational-risk dimensions: adversarial behavior (evasion via obfuscation/text-in-image/coded language; prompt injection for any LLM components; model hardening), privacy & safety constraints (data minimization, retention, encryption, age-appropriate design), fairness & error costs (precision/recall trade-offs by class and region; group fairness), and monitoring/drift/A/B rollout/feedback loops for continuous improvement.

What This Part Should Cover

  • A credible adversarial threat model and concrete defenses, including prompt-injection handling if any LLM is used.
  • Privacy-by-design : data minimization, short retention for ephemeral content, encryption, least-privilege reviewer access, stricter rules for minors.
  • A fairness framework with per-group/region metrics and counterfactual evaluation, tied to per-class error costs.
  • A reliability loop : monitoring/SLOs, data/concept/embedding drift detection, staged rollout (shadow → canary → A/B) with kill-switch, and feedback loops that avoid self-reinforcement.

What a Strong Answer Covers

Across all four parts, a strong answer is judged on how well the pieces cohere into one operational system rather than four disconnected essays:

  • End-to-end coherence: the severity tiers from Part 1 are exactly what the decision engine in Part 3 acts on; the calibration in Part 2 is what makes the cost-sensitive thresholds in Part 3 valid; the drift/fairness signals in Part 4 feed the active-learning loop in Part 2.
  • Explicit trade-offs: latency vs. accuracy (cascade design), precision vs. recall by severity, on-device privacy vs. capability, transparency vs. gameability.
  • Safety-first prioritization: zero-tolerance categories are conservative-by-default (human-review floor, low thresholds, hash-matching) while discretionary categories favor reduce-reach over over-blocking.
  • Realism on the hard parts: ephemerality/privacy constraints, low-resource languages, adversarial evasion, and feedback-loop bias are acknowledged, not hand-waved.

Follow-up Questions

  • A new evasion pattern (e.g., a coded slang term) spreads overnight and your models miss it. Walk through your rapid-response path from detection to a deployed fix without retraining from scratch.
  • Your CSAM classifier's precision is excellent but a regulator demands near-zero false negatives. How does this change your thresholds, human-review staffing, and the noisy-OR fusion across modalities?
  • For minor accounts you want stricter enforcement, but you must also minimize data collected about minors. How do you reconcile stricter thresholds with data minimization ?
  • How would you design the A/B test for a new model when you cannot expose users to known-harmful content in the control arm? What is your metric and how do you avoid ethical/legal problems with the holdback?

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More Snapchat•More Machine Learning Engineer•Snapchat Machine Learning Engineer•Snapchat ML System Design•Machine Learning Engineer ML System Design

Your design canvas — auto-saved

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.