Compute invalid event percentage by pixel
Company: Meta
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Easy
Interview Round: Technical Screen
## Context
You work on an ads pixel instrumentation platform. Each pixel emits events throughout the day; some events are **missing** (not observed) and some are **invalid** (observed but fails validation).
## Tables
### `pixel_event_daily`
One row per `(event_date, pixel_id, signal_type)`.
- `event_date` DATE — calendar date in UTC
- `pixel_id` STRING
- `signal_type` STRING — e.g., `page_view`, `purchase`
- `valid_event_count` BIGINT — number of valid events recorded that day
- `invalid_event_count` BIGINT — number of invalid events recorded that day
- `missing_event_count` BIGINT — number of expected-but-missing events that day
Assumptions:
- Treat `valid_event_count`, `invalid_event_count`, `missing_event_count` as non-negative integers.
- “Yesterday” means the most recent complete UTC day.
## Task
For **yesterday**, compute the **invalid event percentage** for each `pixel_id`.
- Define:
- `total_observed = valid_event_count + invalid_event_count` (exclude missing from the denominator)
- `invalid_pct = invalid_event_count / total_observed`
- If `total_observed = 0`, return `invalid_pct = NULL`.
## Output
Return a result with:
- `pixel_id`
- `invalid_events` (sum over all signal types yesterday)
- `total_observed_events` (sum over all signal types yesterday)
- `invalid_pct` (a decimal between 0 and 1)
Order by `invalid_pct` descending (NULLs last).
Quick Answer: This question evaluates the ability to aggregate time-series event data and compute proportions per entity, specifically summing events across signal types, filtering by a recent date window, and dealing with zero or missing denominators; it is in the Data Manipulation (SQL/Python) category for a Data Scientist role.