This question evaluates the ability to aggregate time-series event data and compute proportions per entity, specifically summing events across signal types, filtering by a recent date window, and dealing with zero or missing denominators; it is in the Data Manipulation (SQL/Python) category for a Data Scientist role.

You work on an ads pixel instrumentation platform. Each pixel emits events throughout the day; some events are missing (not observed) and some are invalid (observed but fails validation).
pixel_event_dailyOne row per (event_date, pixel_id, signal_type).
event_date
DATE — calendar date in UTC
pixel_id
STRING
signal_type
STRING — e.g.,
page_view
,
purchase
valid_event_count
BIGINT — number of valid events recorded that day
invalid_event_count
BIGINT — number of invalid events recorded that day
missing_event_count
BIGINT — number of expected-but-missing events that day
Assumptions:
valid_event_count
,
invalid_event_count
,
missing_event_count
as non-negative integers.
For yesterday, compute the invalid event percentage for each pixel_id.
total_observed = valid_event_count + invalid_event_count
(exclude missing from the denominator)
invalid_pct = invalid_event_count / total_observed
total_observed = 0
, return
invalid_pct = NULL
.
Return a result with:
pixel_id
invalid_events
(sum over all signal types yesterday)
total_observed_events
(sum over all signal types yesterday)
invalid_pct
(a decimal between 0 and 1)
Order by invalid_pct descending (NULLs last).