PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Machine Learning/Google

Build and evaluate illegal-video classifier

Last updated: Mar 29, 2026

Quick Overview

This question evaluates competency in end-to-end Machine Learning system design, including multimodal modeling (vision, audio, text), data engineering for sparse, noisy, and imbalanced labels, robustness and abuse resistance, human-in-the-loop workflows, privacy/retention concerns, and operational metrics.

  • hard
  • Google
  • Machine Learning
  • Data Scientist

Build and evaluate illegal-video classifier

Company: Google

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Technical Screen

Design an end‑to‑end system to flag illegal YouTube videos. - Data: videos with titles/descriptions/captions/thumbnails; sparse, noisy labels; strong class imbalance; evolving policies. - Modeling: choose architectures (vision, audio, text; multimodal fusion), pretraining/embeddings, and a strategy for weak supervision and active learning. - Evaluation: define offline metrics (AUROC, PR‑AUC, calibration, cost‑weighted utility), thresholding for triage tiers, and how to build a reliable test set that resists leakage, near‑duplicates, and distribution shift. - Safety/abuse: adversarial evasion, fairness/false‑positive harms, appeals workflow, and human‑in‑the‑loop review throughput constraints. - Online: rollout plan (shadow mode, canary, interleaving with human rules), counterfactual risk via IPS/DR, and experiment design to measure reduction in policy violations without introducing selection bias.

Quick Answer: This question evaluates competency in end-to-end Machine Learning system design, including multimodal modeling (vision, audio, text), data engineering for sparse, noisy, and imbalanced labels, robustness and abuse resistance, human-in-the-loop workflows, privacy/retention concerns, and operational metrics.

Related Interview Questions

  • Explain ranking cold-start strategies - Google (medium)
  • Explain LLM fine-tuning and generative models - Google (medium)
  • Compare NLP tokenization and LLM recommendations - Google (medium)
  • Explain LLM lifecycle and trade-offs - Google (medium)
  • Build a bigram next-word predictor with weighted sampling - Google (medium)
Google logo
Google
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Machine Learning
5
0
Loading...

End-to-End ML System Design: Flag Illegal YouTube Videos

You are tasked with designing a production ML system to detect and triage potentially illegal YouTube videos at scale. The system must work across modalities (vision, audio, text), handle sparse/noisy labels, strong class imbalance, evolving policies, and integrate with human review.

Assumptions (make minimal, explicit):

  • "Illegal" follows platform policy (e.g., child safety, terror content, incitement to violence), with versioned policy definitions that evolve over time.
  • Actions include: automatic block, downrank/age-restrict, route to human review, or allow.
  • The system must support multilingual/global content and near-real-time decisions.

Design the system across the following areas:

1) Data

  • Inputs: video frames/thumbnails, audio tracks, ASR captions/transcripts, titles/descriptions/tags, uploader/channel metadata, user flags, policy takedown logs.
  • Constraints: sparse and noisy labels, severe class imbalance, evolving policies.
  • Describe data ingestion, feature storage, deduplication/near-duplicate handling, label pipelines (including policy-version tracking), and privacy/retention considerations.

2) Modeling

  • Choose architectures per modality (vision, audio, text) and a multimodal fusion approach.
  • Pretraining/embeddings strategy (self-supervised/foundation models; multilingual coverage).
  • Strategy for weak supervision (heuristics, user flags, external lists) and active learning to acquire high-value labels.
  • Handling class imbalance, noisy labels, and continual learning under policy drift.

3) Evaluation

  • Offline metrics: AUROC, PR-AUC (class imbalance), calibration (ECE/Brier), and cost-weighted utility.
  • Thresholding for triage tiers (auto-block, send-to-review, allow), grounded in expected utility and reviewer capacity.
  • Build a reliable test set that resists leakage, near-duplicates, and distribution shift; include slice-based evaluation (language, region, topic, channel age).

4) Safety and Abuse Resistance

  • Anticipate adversarial evasion and propose robustification and monitoring (without revealing evasion recipes).
  • Fairness and false-positive harm mitigation; transparent appeals workflow; reversibility of actions.
  • Human-in-the-loop design: reviewer tooling, quality control, throughput/SLA constraints, and prioritization.

5) Online Rollout and Measurement

  • Rollout plan: shadow mode, canary, progressive ramp, and interleaving with existing human/rule systems; kill switches.
  • Counterfactual risk estimation using IPS/DR to estimate violation risks and action costs offline.
  • Experiment design to measure reduction in policy violations without selection bias; randomized auditing to estimate true prevalence.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Google•More Data Scientist•Google Data Scientist•Google Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.