Write SQL for content view metrics by device

Q: Write SQL for content view metrics by device

Evaluates SQL data-manipulation and analytics concepts—aggregation with DISTINCT, NULL handling, grouping, time extraction, deterministic tie-breaking, and window functions for ranking—within the Data Manipulation (SQL/Python) category at an intermediate-to-advanced level.

Q: How do I approach Data Manipulation (SQL/Python) interview questions?

Data Manipulation (SQL/Python) questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master data manipulation (sql/python) interviews.

Q: What difficulty level is this interview question?

This is a medium difficulty Data Manipulation (SQL/Python) question, commonly asked during Technical Screen rounds at Newyorktimes.

Q: What role is this question designed for?

This question is commonly asked for Data Analyst candidates at Newyorktimes during technical interviews.

Question

You are a reporting analyst for a news subscription product. You have a fact table of page/content view events.

Assume the following table in Google BigQuery (Standard SQL):

page_views (one row per view event)

event_id STRING (unique)
agent_id STRING (an identifier for the reader/user agent)
device_type STRING (one of: 'phone', 'tab', 'desktop')
content_id STRING (nullable; may be NULL if not resolved)
url STRING
content_type STRING (e.g., 'article', 'video', 'interactive', etc.; may be NULL)
event_ts TIMESTAMP (UTC)

Definitions:

“Mobile” = device_type IN ('phone','tab')
“Desktop” = device_type = 'desktop'
A “unique content view” means counting DISTINCT content_id values (NULLs should not be counted as a distinct content).

Write SQL queries to answer each of the following:

Mobile vs Desktop unique contents viewed:
- Return two rows (mobile, desktop) with the number of unique contents viewed.
- Also include total view events for each device group so it’s clear how “# of unique contents viewed” differs from “# of views”.
- Output columns: device_group , unique_contents_viewed , total_views .
Peak usage hours by device group:
- For each device group (mobile, desktop), find the top 3 hours of day (0–23) with the most view events.
- Use EXTRACT(HOUR FROM event_ts) .
- Output columns: device_group , hour_of_day , views , hour_rank .
Content ID coverage:
- Compute (a) how many unique non-NULL contents were viewed and (b) how many view events have content_id IS NULL.
- Output columns: unique_nonnull_contents_viewed , null_content_view_events .
Distribution of first content type per agent per day:
- For each agent_id and calendar date (UTC), find that agent’s first view event of the day (earliest event_ts ).
- Take the content_type of that first event and compute the overall distribution across all agent-days.
- Output columns: content_type , agent_days , share_of_agent_days .

Notes:

If there are ties for the first event timestamp within an agent-day, break ties deterministically using event_id ascending.
You may assume a large dataset; write SQL that is idiomatic for BigQuery.

Write SQL for content view metrics by device

Quick Overview

Comments (0)