PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/Data Manipulation (SQL/Python)/Google

Compute violation rate and flag precision in SQL

Last updated: Mar 29, 2026

Quick Overview

This question evaluates proficiency in analytical SQL and data engineering concepts—specifically time-windowed metric computation, user activity definitions, join and attribution logic, deduplication, and handling temporal moderation and auto-flag events in BigQuery.

  • Medium
  • Google
  • Data Manipulation (SQL/Python)
  • Data Scientist

Compute violation rate and flag precision in SQL

Company: Google

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Onsite

You are analyzing a Trust & Safety product in BigQuery. Assume 'today' is 2025-09-01 (UTC). Define precise metrics and write SQL to compute them, being explicit about edge cases. Schema (invented): - users(user_id INT, country STRING, signup_time TIMESTAMP) - events(user_id INT, event_time TIMESTAMP, event_type STRING) -- event_type in {'post','comment','message','login'} - content(content_id INT, uploader_user_id INT, created_time TIMESTAMP, country STRING) - moderation_decisions(content_id INT, decision STRING, decision_time TIMESTAMP) -- decision in {'violation','clean','needs_review'}; multiple rows per content over time possible - auto_flags(content_id INT, flag_time TIMESTAMP, rule_id STRING) Sample rows: users +----------+---------+---------------------+ | user_id | country | signup_time | +----------+---------+---------------------+ | 1 | US | 2025-01-05 10:00:00 | | 2 | IN | 2025-07-12 08:00:00 | | 3 | US | 2025-08-20 12:00:00 | | 4 | BR | 2025-06-01 09:30:00 | +----------+---------+---------------------+ events +----------+---------------------+-------------+ | user_id | event_time | event_type | +----------+---------------------+-------------+ | 1 | 2025-08-29 02:00:00 | post | | 1 | 2025-08-30 03:00:00 | login | | 2 | 2025-08-28 11:00:00 | comment | | 3 | 2025-08-27 16:00:00 | message | | 4 | 2025-07-15 18:00:00 | post | +----------+---------------------+-------------+ content +------------+-------------------+---------------------+---------+ | content_id | uploader_user_id | created_time | country | +------------+-------------------+---------------------+---------+ | 100 | 1 | 2025-08-28 05:00:00 | US | | 101 | 2 | 2025-08-26 10:00:00 | IN | | 102 | 3 | 2025-08-31 01:00:00 | US | | 103 | 4 | 2025-07-01 06:00:00 | BR | +------------+-------------------+---------------------+---------+ moderation_decisions +------------+-----------+---------------------+ | content_id | decision | decision_time | +------------+-----------+---------------------+ | 100 | violation | 2025-08-31 07:00:00 | | 100 | clean | 2025-09-02 01:00:00 | | 101 | clean | 2025-08-27 12:00:00 | | 102 | violation | 2025-09-01 10:00:00 | | 103 | violation | 2025-08-10 09:00:00 | +------------+-----------+---------------------+ auto_flags +------------+---------------------+----------+ | content_id | flag_time | rule_id | +------------+---------------------+----------+ | 100 | 2025-08-31 06:30:00 | R1 | | 101 | 2025-08-26 10:30:00 | R2 | | 102 | 2025-09-01 09:00:00 | R1 | | 102 | 2025-09-01 09:05:00 | R1 | +------------+---------------------+----------+ Task A — 7-day Confirmed Violation Rate per 1,000 Active Users by country: - Window: [2025-08-26 00:00:00, 2025-09-02 00:00:00) (the last 7 full days ending at 'today' inclusive via half-open interval). - Active users: distinct users with ≥1 events where event_type IN ('post','comment','message') within the window. - Country attribution: use users.country for the uploader_user_id; if NULL, fallback to content.country. - Violations counted: distinct content_id with at least one decision='violation' whose decision_time falls within the window AND whose content.created_time < window_end (exclude content created after the window). If multiple decisions for the same content exist within the window, use the latest within-window decision only; if both 'clean' and 'violation' occur within the window, treat as violation if any within-window decision is 'violation'. Ignore decisions outside the window when determining within-window status. - Output columns: country, window_start, window_end, active_users, violations, violation_rate_per_1000 (round to 2 decimals). - Write a single BigQuery-standard SQL query to produce this. Task B — 28-day Auto-Flag Precision by rule_id: - Flag cohort: auto_flags where flag_time in [2025-08-05 00:00:00, 2025-09-02 00:00:00). - Deduping: if multiple flags exist for the same content_id and rule_id within the cohort, keep only the earliest flag for that (content_id, rule_id) pair. - Match window: a flag is a true positive if a moderation_decisions row with decision='violation' occurs within (flag_time, flag_time + 48 hours]. If multiple decisions occur in that 48h window, use the latest; 'needs_review' and 'clean' do not count as violations. Decisions after 48h do not count for precision. - Output columns: rule_id, flags, matched_violations, precision = matched_violations / flags. - Write a single BigQuery-standard SQL query to produce this. Edge cases to handle explicitly in your SQL: time zones (assume all timestamps UTC), late-arriving decisions (outside windows are ignored), users with no country (coalesce with content.country), and zero-division (precision should be 0.0 if flags=0).

Quick Answer: This question evaluates proficiency in analytical SQL and data engineering concepts—specifically time-windowed metric computation, user activity definitions, join and attribution logic, deduplication, and handling temporal moderation and auto-flag events in BigQuery.

Related Interview Questions

  • Generate binomial matrix and column-normalize - Google (Medium)
  • Analyze video flags and reviews with SQL - Google (Medium)
  • Write SQL/Python for messy event data - Google (Medium)
  • Add a conditional column in Python - Google (Medium)
  • Find most co‑purchased product pairs in SQL - Google (Medium)
Google logo
Google
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Data Manipulation (SQL/Python)
5
0

You are analyzing a Trust & Safety product in BigQuery. Assume 'today' is 2025-09-01 (UTC). Define precise metrics and write SQL to compute them, being explicit about edge cases.

Schema (invented):

  • users(user_id INT, country STRING, signup_time TIMESTAMP)
  • events(user_id INT, event_time TIMESTAMP, event_type STRING) -- event_type in {'post','comment','message','login'}
  • content(content_id INT, uploader_user_id INT, created_time TIMESTAMP, country STRING)
  • moderation_decisions(content_id INT, decision STRING, decision_time TIMESTAMP) -- decision in {'violation','clean','needs_review'}; multiple rows per content over time possible
  • auto_flags(content_id INT, flag_time TIMESTAMP, rule_id STRING)

Sample rows: users +----------+---------+---------------------+ | user_id | country | signup_time | +----------+---------+---------------------+ | 1 | US | 2025-01-05 10:00:00 | | 2 | IN | 2025-07-12 08:00:00 | | 3 | US | 2025-08-20 12:00:00 | | 4 | BR | 2025-06-01 09:30:00 | +----------+---------+---------------------+

events +----------+---------------------+-------------+ | user_id | event_time | event_type | +----------+---------------------+-------------+ | 1 | 2025-08-29 02:00:00 | post | | 1 | 2025-08-30 03:00:00 | login | | 2 | 2025-08-28 11:00:00 | comment | | 3 | 2025-08-27 16:00:00 | message | | 4 | 2025-07-15 18:00:00 | post | +----------+---------------------+-------------+

content +------------+-------------------+---------------------+---------+ | content_id | uploader_user_id | created_time | country | +------------+-------------------+---------------------+---------+ | 100 | 1 | 2025-08-28 05:00:00 | US | | 101 | 2 | 2025-08-26 10:00:00 | IN | | 102 | 3 | 2025-08-31 01:00:00 | US | | 103 | 4 | 2025-07-01 06:00:00 | BR | +------------+-------------------+---------------------+---------+

moderation_decisions +------------+-----------+---------------------+ | content_id | decision | decision_time | +------------+-----------+---------------------+ | 100 | violation | 2025-08-31 07:00:00 | | 100 | clean | 2025-09-02 01:00:00 | | 101 | clean | 2025-08-27 12:00:00 | | 102 | violation | 2025-09-01 10:00:00 | | 103 | violation | 2025-08-10 09:00:00 | +------------+-----------+---------------------+

auto_flags +------------+---------------------+----------+ | content_id | flag_time | rule_id | +------------+---------------------+----------+ | 100 | 2025-08-31 06:30:00 | R1 | | 101 | 2025-08-26 10:30:00 | R2 | | 102 | 2025-09-01 09:00:00 | R1 | | 102 | 2025-09-01 09:05:00 | R1 | +------------+---------------------+----------+

Task A — 7-day Confirmed Violation Rate per 1,000 Active Users by country:

  • Window: [2025-08-26 00:00:00, 2025-09-02 00:00:00) (the last 7 full days ending at 'today' inclusive via half-open interval).
  • Active users: distinct users with ≥1 events where event_type IN ('post','comment','message') within the window.
  • Country attribution: use users.country for the uploader_user_id; if NULL, fallback to content.country.
  • Violations counted: distinct content_id with at least one decision='violation' whose decision_time falls within the window AND whose content.created_time < window_end (exclude content created after the window). If multiple decisions for the same content exist within the window, use the latest within-window decision only; if both 'clean' and 'violation' occur within the window, treat as violation if any within-window decision is 'violation'. Ignore decisions outside the window when determining within-window status.
  • Output columns: country, window_start, window_end, active_users, violations, violation_rate_per_1000 (round to 2 decimals).
  • Write a single BigQuery-standard SQL query to produce this.

Task B — 28-day Auto-Flag Precision by rule_id:

  • Flag cohort: auto_flags where flag_time in [2025-08-05 00:00:00, 2025-09-02 00:00:00).
  • Deduping: if multiple flags exist for the same content_id and rule_id within the cohort, keep only the earliest flag for that (content_id, rule_id) pair.
  • Match window: a flag is a true positive if a moderation_decisions row with decision='violation' occurs within (flag_time, flag_time + 48 hours]. If multiple decisions occur in that 48h window, use the latest; 'needs_review' and 'clean' do not count as violations. Decisions after 48h do not count for precision.
  • Output columns: rule_id, flags, matched_violations, precision = matched_violations / flags.
  • Write a single BigQuery-standard SQL query to produce this.

Edge cases to handle explicitly in your SQL: time zones (assume all timestamps UTC), late-arriving decisions (outside windows are ignored), users with no country (coalesce with content.country), and zero-division (precision should be 0.0 if flags=0).

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Data Manipulation (SQL/Python)•More Google•More Data Scientist•Google Data Scientist•Google Data Manipulation (SQL/Python)•Data Scientist Data Manipulation (SQL/Python)
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.