##### Question You are given article view events and article metadata. **Table 1: `article_views`** — one row per article view event. | Column | Type | Description | | --- | --- | --- | | `user_id` | INT | The user who viewed the article | | `article_id` | INT | The article that was viewed | | `view_date` | DATE | UTC calendar date of the view | **Table 2: `articles`** — maps each article to its type. | Column | Type | Description | | --- | --- | --- | | `article_id` | INT | Article identifier | | `article_type` | VARCHAR | The category/type of the article | **Relationship:** `article_views.article_id = articles.article_id` **Assumptions** - Each row in `article_views` represents one article view event. - `view_date` is a calendar date; treat dates as UTC and assume no timezone conversion is needed. - If a user views multiple articles of the same `article_type`, that type is counted **only once** for that user (distinct types). Write SQL for the following: 1. **Daily distinct types per user.** For `2019-01-01`, return the number of distinct article types viewed by each user. - Output columns: `user_id, num_article_types` 2. **Histogram across all dates.** Build a histogram of the number of distinct article types viewed per user across the full dataset (all available dates). For each possible value of `num_article_types`, return how many users viewed exactly that many distinct article types. - Output columns: `num_article_types, num_users`

Count Article Types Viewed evaluates SQL or pandas logic, joins, grouping, window functions, null handling, edge cases, and validation in a realistic interview setting. A strong answer states assumptions, handles edge cases, explains trade-offs, and shows how to validate the result clearly.

How do I approach Data Manipulation (SQL/Python) interview questions?

Data Manipulation (SQL/Python) questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master data manipulation (sql/python) interviews.

What difficulty level is this interview question?

This is a medium difficulty Data Manipulation (SQL/Python) question, commonly asked during Technical Screen rounds at LinkedIn.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at LinkedIn during technical interviews.

Count Article Types Viewed | LinkedIn Interview Question

Count Article Types Viewed

You are given article view events and article metadata.

Table 1: article_views — one row per article view event.

Column	Type	Description
`user_id`	INT	The user who viewed the article
`article_id`	INT	The article that was viewed
`view_date`	DATE	UTC calendar date of the view

Table 2: articles — maps each article to its type.

Column	Type	Description
`article_id`	INT	Article identifier
`article_type`	VARCHAR	The category/type of the article

Relationship: article_views.article_id = articles.article_id

Assumptions

Each row in article_views represents one article view event.
view_date is a calendar date; treat dates as UTC and assume no timezone conversion is needed.
If a user views multiple articles of the same article_type , that type is counted only once for that user (distinct types).

Write SQL for the following:

Daily distinct types per user. For 2019-01-01 , return the number of distinct article types viewed by each user.
- Output columns: user_id, num_article_types
Histogram across all dates. Build a histogram of the number of distinct article types viewed per user across the full dataset (all available dates). For each possible value of num_article_types , return how many users viewed exactly that many distinct article types.
- Output columns: num_article_types, num_users

Constraints & Assumptions

Preserve the scope, facts, inputs, and requested outputs from the prompt above.
If the prompt leaves a detail unspecified, state a reasonable assumption before relying on it.
Keep the answer interview-ready: concise enough to present, but concrete enough to implement or evaluate.

Clarifying Questions to Ask

Clarify SQL dialect or Python library versions, date/time semantics, duplicate handling, and null handling.
Define the grain of each intermediate result before aggregating.
State expected output columns and ordering explicitly.

What a Strong Answer Covers

A query or pandas plan that matches the requested output grain.
Correct joins, filters, grouping, window functions, and treatment of NULLs or duplicates.
A brief explanation of why the result is correct and how it handles edge cases.
Performance notes, indexes/partitioning, and validation queries when relevant.

Follow-up Questions

How would you test the query on a tiny hand-built dataset?
What changes if duplicate events or late-arriving data are present?
Which indexes, clustering, or partitions would help at production scale?

Count Article Types Viewed

You are given article view events and article metadata.

Table 1: article_views — one row per article view event.

Column	Type	Description
`user_id`	INT	The user who viewed the article
`article_id`	INT	The article that was viewed
`view_date`	DATE	UTC calendar date of the view

Table 2: articles — maps each article to its type.

Column	Type	Description
`article_id`	INT	Article identifier
`article_type`	VARCHAR	The category/type of the article

Relationship: article_views.article_id = articles.article_id

Assumptions

Each row in article_views represents one article view event.
view_date is a calendar date; treat dates as UTC and assume no timezone conversion is needed.
If a user views multiple articles of the same article_type , that type is counted only once for that user (distinct types).

Write SQL for the following:

Daily distinct types per user. For 2019-01-01 , return the number of distinct article types viewed by each user.
- Output columns: user_id, num_article_types
Histogram across all dates. Build a histogram of the number of distinct article types viewed per user across the full dataset (all available dates). For each possible value of num_article_types , return how many users viewed exactly that many distinct article types.
- Output columns: num_article_types, num_users

Constraints & Assumptions

Preserve the scope, facts, inputs, and requested outputs from the prompt above.
If the prompt leaves a detail unspecified, state a reasonable assumption before relying on it.
Keep the answer interview-ready: concise enough to present, but concrete enough to implement or evaluate.

Clarifying Questions to Ask

Clarify SQL dialect or Python library versions, date/time semantics, duplicate handling, and null handling.
Define the grain of each intermediate result before aggregating.
State expected output columns and ordering explicitly.

What a Strong Answer Covers

A query or pandas plan that matches the requested output grain.
Correct joins, filters, grouping, window functions, and treatment of NULLs or duplicates.
A brief explanation of why the result is correct and how it handles edge cases.
Performance notes, indexes/partitioning, and validation queries when relevant.

Follow-up Questions

How would you test the query on a tiny hand-built dataset?
What changes if duplicate events or late-arriving data are present?
Which indexes, clustering, or partitions would help at production scale?

Count Article Types Viewed

Quick Overview

Count Article Types Viewed

Count Article Types Viewed

Constraints & Assumptions

Clarifying Questions to Ask

What a Strong Answer Covers

Follow-up Questions

Write your answer

Count Article Types Viewed

Quick Overview

Count Article Types Viewed

Count Article Types Viewed

Constraints & Assumptions

Clarifying Questions to Ask

What a Strong Answer Covers

Follow-up Questions

Write your answer