This question evaluates proficiency in SQL-based data manipulation and analytics, including aggregation, distinct counting, NULL handling, timestamp/hour extraction, device grouping, and row-level ordering, and is situated in the Data Manipulation (SQL/Python) domain.
You work with page-view event logs and need to compute several engagement/usage summaries.
Assume a single table:
page_views| column | type | description |
|---|---|---|
view_id | STRING/INT | unique event id (PK) |
agent_id | STRING/INT | identifier for the viewer ("agent") |
content_id | STRING/INT | identifier of the content item (nullable) |
url | STRING | URL viewed |
content_type | STRING | e.g., article, video, podcast, etc. |
device_type | STRING | one of phone, tab, desktop |
viewed_at | TIMESTAMP | event timestamp |
Assumptions:
device_type IN ('phone','tab')
, and
desktop
as
device_type = 'desktop'
.
Compute, for each device group (mobile vs desktop):
unique_contents_viewed
: number of distinct
content_id
values viewed (ignore NULLs)
total_views
: total number of page-view events
Return columns: device_group, unique_contents_viewed, total_views.
Additionally (in 1–2 sentences), explain whether unique_contents_viewed and total_views will generally be equal, and why.
For each device type (phone, tab, desktop), find the top 3 hours of day (0–23) with the highest number of views.
Return columns: device_type, hour_of_day, views.
Notes:
viewed_at
.
hour_of_day
first.
Compute:
unique_contents_viewed
: number of distinct non-NULL
content_id
values that were viewed
null_content_events
: number of events where
content_id
IS NULL
Return columns: unique_contents_viewed, null_content_events.
For each agent_id and each calendar date (derived from viewed_at), identify the first page view of that day (earliest viewed_at; if ties, use smallest view_id). Take that event’s content_type.
Then compute the distribution of these first-seen content_type values across all agent-days.
Return columns:
event_date
content_type
agent_days
(count of agent-days whose first view had that
content_type
)
share
(agent_days divided by total agent-days for that date)