You are analyzing a Trust & Safety product in BigQuery. Assume 'today' is 2025-09-01 (UTC). Define precise metrics and write SQL to compute them, being explicit about edge cases. Schema (invented): - users(user_id INT, country STRING, signup_time TIMESTAMP) - events(user_id INT, event_time TIMESTAMP, event_type STRING) -- event_type in {'post','comment','message','login'} - content(content_id INT, uploader_user_id INT, created_time TIMESTAMP, country STRING) - moderation_decisions(content_id INT, decision STRING, decision_time TIMESTAMP) -- decision in {'violation','clean','needs_review'}; multiple rows per content over time possible - auto_flags(content_id INT, flag_time TIMESTAMP, rule_id STRING) Sample rows: users +----------+---------+---------------------+ | user_id | country | signup_time | +----------+---------+---------------------+ | 1 | US | 2025-01-05 10:00:00 | | 2 | IN | 2025-07-12 08:00:00 | | 3 | US | 2025-08-20 12:00:00 | | 4 | BR | 2025-06-01 09:30:00 | +----------+---------+---------------------+ events +----------+---------------------+-------------+ | user_id | event_time | event_type | +----------+---------------------+-------------+ | 1 | 2025-08-29 02:00:00 | post | | 1 | 2025-08-30 03:00:00 | login | | 2 | 2025-08-28 11:00:00 | comment | | 3 | 2025-08-27 16:00:00 | message | | 4 | 2025-07-15 18:00:00 | post | +----------+---------------------+-------------+ content +------------+-------------------+---------------------+---------+ | content_id | uploader_user_id | created_time | country | +------------+-------------------+---------------------+---------+ | 100 | 1 | 2025-08-28 05:00:00 | US | | 101 | 2 | 2025-08-26 10:00:00 | IN | | 102 | 3 | 2025-08-31 01:00:00 | US | | 103 | 4 | 2025-07-01 06:00:00 | BR | +------------+-------------------+---------------------+---------+ moderation_decisions +------------+-----------+---------------------+ | content_id | decision | decision_time | +------------+-----------+---------------------+ | 100 | violation | 2025-08-31 07:00:00 | | 100 | clean | 2025-09-02 01:00:00 | | 101 | clean | 2025-08-27 12:00:00 | | 102 | violation | 2025-09-01 10:00:00 | | 103 | violation | 2025-08-10 09:00:00 | +------------+-----------+---------------------+ auto_flags +------------+---------------------+----------+ | content_id | flag_time | rule_id | +------------+---------------------+----------+ | 100 | 2025-08-31 06:30:00 | R1 | | 101 | 2025-08-26 10:30:00 | R2 | | 102 | 2025-09-01 09:00:00 | R1 | | 102 | 2025-09-01 09:05:00 | R1 | +------------+---------------------+----------+ Task A — 7-day Confirmed Violation Rate per 1,000 Active Users by country: - Window: [2025-08-26 00:00:00, 2025-09-02 00:00:00) (the last 7 full days ending at 'today' inclusive via half-open interval). - Active users: distinct users with ≥1 events where event_type IN ('post','comment','message') within the window. - Country attribution: use users.country for the uploader_user_id; if NULL, fallback to content.country. - Violations counted: distinct content_id with at least one decision='violation' whose decision_time falls within the window AND whose content.created_time < window_end (exclude content created after the window). If multiple decisions for the same content exist within the window, use the latest within-window decision only; if both 'clean' and 'violation' occur within the window, treat as violation if any within-window decision is 'violation'. Ignore decisions outside the window when determining within-window status. - Output columns: country, window_start, window_end, active_users, violations, violation_rate_per_1000 (round to 2 decimals). - Write a single BigQuery-standard SQL query to produce this. Task B — 28-day Auto-Flag Precision by rule_id: - Flag cohort: auto_flags where flag_time in [2025-08-05 00:00:00, 2025-09-02 00:00:00). - Deduping: if multiple flags exist for the same content_id and rule_id within the cohort, keep only the earliest flag for that (content_id, rule_id) pair. - Match window: a flag is a true positive if a moderation_decisions row with decision='violation' occurs within (flag_time, flag_time + 48 hours]. If multiple decisions occur in that 48h window, use the latest; 'needs_review' and 'clean' do not count as violations. Decisions after 48h do not count for precision. - Output columns: rule_id, flags, matched_violations, precision = matched_violations / flags. - Write a single BigQuery-standard SQL query to produce this. Edge cases to handle explicitly in your SQL: time zones (assume all timestamps UTC), late-arriving decisions (outside windows are ignored), users with no country (coalesce with content.country), and zero-division (precision should be 0.0 if flags=0).

This question evaluates proficiency in analytical SQL and data engineering concepts—specifically time-windowed metric computation, user activity definitions, join and attribution logic, deduplication, and handling temporal moderation and auto-flag events in BigQuery.

How do I approach Data Manipulation (SQL/Python) interview questions?

Data Manipulation (SQL/Python) questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master data manipulation (sql/python) interviews.

What difficulty level is this interview question?

This is a medium difficulty Data Manipulation (SQL/Python) question, commonly asked during Onsite rounds at Google.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Google during technical interviews.

Compute violation rate and flag precision in SQL

Q: Compute violation rate and flag precision in SQL

This question evaluates proficiency in analytical SQL and data engineering concepts—specifically time-windowed metric computation, user activity definitions, join and attribution logic, deduplication, and handling temporal moderation and auto-flag events in BigQuery.

Q: How do I approach Data Manipulation (SQL/Python) interview questions?

Data Manipulation (SQL/Python) questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master data manipulation (sql/python) interviews.

Q: What difficulty level is this interview question?

This is a medium difficulty Data Manipulation (SQL/Python) question, commonly asked during Onsite rounds at Google.

Q: What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Google during technical interviews.

You are analyzing a Trust & Safety product in BigQuery. Assume 'today' is 2025-09-01 (UTC). Define precise metrics and write SQL to compute them, being explicit about edge cases.

Schema (invented):

users(user_id INT, country STRING, signup_time TIMESTAMP)
events(user_id INT, event_time TIMESTAMP, event_type STRING) -- event_type in {'post','comment','message','login'}
content(content_id INT, uploader_user_id INT, created_time TIMESTAMP, country STRING)
moderation_decisions(content_id INT, decision STRING, decision_time TIMESTAMP) -- decision in {'violation','clean','needs_review'}; multiple rows per content over time possible
auto_flags(content_id INT, flag_time TIMESTAMP, rule_id STRING)

Sample rows: users +----------+---------+---------------------+ | user_id | country | signup_time | +----------+---------+---------------------+ | 1 | US | 2025-01-05 10:00:00 | | 2 | IN | 2025-07-12 08:00:00 | | 3 | US | 2025-08-20 12:00:00 | | 4 | BR | 2025-06-01 09:30:00 | +----------+---------+---------------------+

events +----------+---------------------+-------------+ | user_id | event_time | event_type | +----------+---------------------+-------------+ | 1 | 2025-08-29 02:00:00 | post | | 1 | 2025-08-30 03:00:00 | login | | 2 | 2025-08-28 11:00:00 | comment | | 3 | 2025-08-27 16:00:00 | message | | 4 | 2025-07-15 18:00:00 | post | +----------+---------------------+-------------+

content +------------+-------------------+---------------------+---------+ | content_id | uploader_user_id | created_time | country | +------------+-------------------+---------------------+---------+ | 100 | 1 | 2025-08-28 05:00:00 | US | | 101 | 2 | 2025-08-26 10:00:00 | IN | | 102 | 3 | 2025-08-31 01:00:00 | US | | 103 | 4 | 2025-07-01 06:00:00 | BR | +------------+-------------------+---------------------+---------+

moderation_decisions +------------+-----------+---------------------+ | content_id | decision | decision_time | +------------+-----------+---------------------+ | 100 | violation | 2025-08-31 07:00:00 | | 100 | clean | 2025-09-02 01:00:00 | | 101 | clean | 2025-08-27 12:00:00 | | 102 | violation | 2025-09-01 10:00:00 | | 103 | violation | 2025-08-10 09:00:00 | +------------+-----------+---------------------+

auto_flags +------------+---------------------+----------+ | content_id | flag_time | rule_id | +------------+---------------------+----------+ | 100 | 2025-08-31 06:30:00 | R1 | | 101 | 2025-08-26 10:30:00 | R2 | | 102 | 2025-09-01 09:00:00 | R1 | | 102 | 2025-09-01 09:05:00 | R1 | +------------+---------------------+----------+

Task A — 7-day Confirmed Violation Rate per 1,000 Active Users by country:

Window: [2025-08-26 00:00:00, 2025-09-02 00:00:00) (the last 7 full days ending at 'today' inclusive via half-open interval).
Active users: distinct users with ≥1 events where event_type IN ('post','comment','message') within the window.
Country attribution: use users.country for the uploader_user_id; if NULL, fallback to content.country.
Violations counted: distinct content_id with at least one decision='violation' whose decision_time falls within the window AND whose content.created_time < window_end (exclude content created after the window). If multiple decisions for the same content exist within the window, use the latest within-window decision only; if both 'clean' and 'violation' occur within the window, treat as violation if any within-window decision is 'violation'. Ignore decisions outside the window when determining within-window status.
Output columns: country, window_start, window_end, active_users, violations, violation_rate_per_1000 (round to 2 decimals).
Write a single BigQuery-standard SQL query to produce this.

Task B — 28-day Auto-Flag Precision by rule_id:

Flag cohort: auto_flags where flag_time in [2025-08-05 00:00:00, 2025-09-02 00:00:00).
Deduping: if multiple flags exist for the same content_id and rule_id within the cohort, keep only the earliest flag for that (content_id, rule_id) pair.
Match window: a flag is a true positive if a moderation_decisions row with decision='violation' occurs within (flag_time, flag_time + 48 hours]. If multiple decisions occur in that 48h window, use the latest; 'needs_review' and 'clean' do not count as violations. Decisions after 48h do not count for precision.
Output columns: rule_id, flags, matched_violations, precision = matched_violations / flags.
Write a single BigQuery-standard SQL query to produce this.

Edge cases to handle explicitly in your SQL: time zones (assume all timestamps UTC), late-arriving decisions (outside windows are ignored), users with no country (coalesce with content.country), and zero-division (precision should be 0.0 if flags=0).

You are analyzing a Trust & Safety product in BigQuery. Assume 'today' is 2025-09-01 (UTC). Define precise metrics and write SQL to compute them, being explicit about edge cases.

Schema (invented):

users(user_id INT, country STRING, signup_time TIMESTAMP)
events(user_id INT, event_time TIMESTAMP, event_type STRING) -- event_type in {'post','comment','message','login'}
content(content_id INT, uploader_user_id INT, created_time TIMESTAMP, country STRING)
moderation_decisions(content_id INT, decision STRING, decision_time TIMESTAMP) -- decision in {'violation','clean','needs_review'}; multiple rows per content over time possible
auto_flags(content_id INT, flag_time TIMESTAMP, rule_id STRING)

Task A — 7-day Confirmed Violation Rate per 1,000 Active Users by country:

Window: [2025-08-26 00:00:00, 2025-09-02 00:00:00) (the last 7 full days ending at 'today' inclusive via half-open interval).
Active users: distinct users with ≥1 events where event_type IN ('post','comment','message') within the window.
Country attribution: use users.country for the uploader_user_id; if NULL, fallback to content.country.
Violations counted: distinct content_id with at least one decision='violation' whose decision_time falls within the window AND whose content.created_time < window_end (exclude content created after the window). If multiple decisions for the same content exist within the window, use the latest within-window decision only; if both 'clean' and 'violation' occur within the window, treat as violation if any within-window decision is 'violation'. Ignore decisions outside the window when determining within-window status.
Output columns: country, window_start, window_end, active_users, violations, violation_rate_per_1000 (round to 2 decimals).
Write a single BigQuery-standard SQL query to produce this.

Task B — 28-day Auto-Flag Precision by rule_id:

Flag cohort: auto_flags where flag_time in [2025-08-05 00:00:00, 2025-09-02 00:00:00).
Deduping: if multiple flags exist for the same content_id and rule_id within the cohort, keep only the earliest flag for that (content_id, rule_id) pair.
Match window: a flag is a true positive if a moderation_decisions row with decision='violation' occurs within (flag_time, flag_time + 48 hours]. If multiple decisions occur in that 48h window, use the latest; 'needs_review' and 'clean' do not count as violations. Decisions after 48h do not count for precision.
Output columns: rule_id, flags, matched_violations, precision = matched_violations / flags.
Write a single BigQuery-standard SQL query to produce this.

Compute violation rate and flag precision in SQL

Quick Overview

Compute violation rate and flag precision in SQL

Write your answer

Compute violation rate and flag precision in SQL

Quick Overview

Compute violation rate and flag precision in SQL

Write your answer