Evaluates SQL data-manipulation and analytics concepts—aggregation with DISTINCT, NULL handling, grouping, time extraction, deterministic tie-breaking, and window functions for ranking—within the Data Manipulation (SQL/Python) category at an intermediate-to-advanced level.
You are a reporting analyst for a news subscription product. You have a fact table of page/content view events.
Assume the following table in Google BigQuery (Standard SQL):
page_views (one row per view event)
event_id
STRING (unique)
agent_id
STRING (an identifier for the reader/user agent)
device_type
STRING (one of: 'phone', 'tab', 'desktop')
content_id
STRING (nullable; may be NULL if not resolved)
url
STRING
content_type
STRING (e.g., 'article', 'video', 'interactive', etc.; may be NULL)
event_ts
TIMESTAMP (UTC)
Definitions:
content_id
values (NULLs should not be counted as a distinct content).
Write SQL queries to answer each of the following:
device_group
,
unique_contents_viewed
,
total_views
.
EXTRACT(HOUR FROM event_ts)
.
device_group
,
hour_of_day
,
views
,
hour_rank
.
content_id
IS NULL.
unique_nonnull_contents_viewed
,
null_content_view_events
.
agent_id
and calendar date (UTC), find that agent’s first view event of the day (earliest
event_ts
).
content_type
of that first event and compute the overall distribution across all agent-days.
content_type
,
agent_days
,
share_of_agent_days
.
Notes:
event_id
ascending.