PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Google

Build and evaluate illegal-video classifier

Last updated: Mar 29, 2026

Quick Overview

This question evaluates competency in end-to-end Machine Learning system design, including multimodal modeling (vision, audio, text), data engineering for sparse, noisy, and imbalanced labels, robustness and abuse resistance, human-in-the-loop workflows, privacy/retention concerns, and operational metrics.

  • hard
  • Google
  • Machine Learning
  • Data Scientist

Build and evaluate illegal-video classifier

Company: Google

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Technical Screen

Design an end‑to‑end system to flag illegal YouTube videos. - Data: videos with titles/descriptions/captions/thumbnails; sparse, noisy labels; strong class imbalance; evolving policies. - Modeling: choose architectures (vision, audio, text; multimodal fusion), pretraining/embeddings, and a strategy for weak supervision and active learning. - Evaluation: define offline metrics (AUROC, PR‑AUC, calibration, cost‑weighted utility), thresholding for triage tiers, and how to build a reliable test set that resists leakage, near‑duplicates, and distribution shift. - Safety/abuse: adversarial evasion, fairness/false‑positive harms, appeals workflow, and human‑in‑the‑loop review throughput constraints. - Online: rollout plan (shadow mode, canary, interleaving with human rules), counterfactual risk via IPS/DR, and experiment design to measure reduction in policy violations without introducing selection bias.

Quick Answer: This question evaluates competency in end-to-end Machine Learning system design, including multimodal modeling (vision, audio, text), data engineering for sparse, noisy, and imbalanced labels, robustness and abuse resistance, human-in-the-loop workflows, privacy/retention concerns, and operational metrics.

Related Interview Questions

  • Explain ranking cold-start strategies - Google (medium)
  • Explain LLM fine-tuning and generative models - Google (medium)
  • Compare NLP tokenization and LLM recommendations - Google (medium)
  • Explain LLM lifecycle and trade-offs - Google (medium)
  • Build a bigram next-word predictor with weighted sampling - Google (medium)
|Home/Machine Learning/Google

Build and evaluate illegal-video classifier

Google logo
Google
Oct 13, 2025, 9:49 PM
hardData ScientistTechnical ScreenMachine Learning
7
0
Loading...

End-to-End ML System Design: Flag Illegal YouTube Videos

You are tasked with designing a production ML system to detect and triage potentially illegal YouTube videos at scale. The system must work across modalities (vision, audio, text), handle sparse/noisy labels, strong class imbalance, evolving policies, and integrate with human review.

Assumptions (make minimal, explicit):

  • "Illegal" follows platform policy (e.g., child safety, terror content, incitement to violence), with versioned policy definitions that evolve over time.
  • Actions include: automatic block, downrank/age-restrict, route to human review, or allow.
  • The system must support multilingual/global content and near-real-time decisions.

Design the system across the following areas:

1) Data

  • Inputs: video frames/thumbnails, audio tracks, ASR captions/transcripts, titles/descriptions/tags, uploader/channel metadata, user flags, policy takedown logs.
  • Constraints: sparse and noisy labels, severe class imbalance, evolving policies.
  • Describe data ingestion, feature storage, deduplication/near-duplicate handling, label pipelines (including policy-version tracking), and privacy/retention considerations.

2) Modeling

  • Choose architectures per modality (vision, audio, text) and a multimodal fusion approach.
  • Pretraining/embeddings strategy (self-supervised/foundation models; multilingual coverage).
  • Strategy for weak supervision (heuristics, user flags, external lists) and active learning to acquire high-value labels.
  • Handling class imbalance, noisy labels, and continual learning under policy drift.

3) Evaluation

  • Offline metrics: AUROC, PR-AUC (class imbalance), calibration (ECE/Brier), and cost-weighted utility.
  • Thresholding for triage tiers (auto-block, send-to-review, allow), grounded in expected utility and reviewer capacity.
  • Build a reliable test set that resists leakage, near-duplicates, and distribution shift; include slice-based evaluation (language, region, topic, channel age).

4) Safety and Abuse Resistance

  • Anticipate adversarial evasion and propose robustification and monitoring (without revealing evasion recipes).
  • Fairness and false-positive harm mitigation; transparent appeals workflow; reversibility of actions.
  • Human-in-the-loop design: reviewer tooling, quality control, throughput/SLA constraints, and prioritization.

5) Online Rollout and Measurement

  • Rollout plan: shadow mode, canary, progressive ramp, and interleaving with existing human/rule systems; kill switches.
  • Counterfactual risk estimation using IPS/DR to estimate violation risks and action costs offline.
  • Experiment design to measure reduction in policy violations without selection bias; randomized auditing to estimate true prevalence.
Loading comments...

Browse More Questions

More Machine Learning•More Google•More Data Scientist•Google Data Scientist•Google Machine Learning•Data Scientist Machine Learning

Write your answer

Your first approved answer each day earns 20 XP.

Sign in to write your answer.
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.