Analyze member video posting behavior by country
Company: LinkedIn
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: easy
Interview Round: Technical Screen
##### Question
You are given two tables describing LinkedIn members and the videos they upload. Write SQL (and optionally Python where noted) to answer the questions below.
### Tables
`members`
- `member_id` INT (PK)
- `country` VARCHAR
- `join_date` DATE — UTC calendar date when the member joined
`video_posts`
- `post_date` DATE — UTC calendar date of the upload
- `member_id` INT (FK → `members.member_id`)
- `video_length_seconds` INT — length of the video in seconds
### Relationship & assumptions
- `video_posts.member_id = members.member_id`.
- Each row in `video_posts` represents one uploaded video; a member may post multiple videos on different dates.
- Dates are compared as UTC calendar dates.
- A member's **first-ever video date** is the **minimum `post_date`** for that member. If a member uploaded multiple videos on that earliest date, that date is still considered the member's first video date.
### Tasks
1. How many members posted their **first-ever** video on the **same calendar date** they joined the platform?
- Output column: `members_first_video_on_join_date` (a single row).
2. Compare the total number of uploaded videos by US members versus non-US members.
- Define `member_region = 'US'` when `country = 'usa'`, otherwise `member_region = 'Non-US'`.
- Output columns: `member_region`, `total_video_posts`.
3. For each country, count how many **distinct members** have posted **at least one** video longer than 60 seconds.
- Output columns: `country`, `members_with_video_over_60s`.
Quick Answer: A LinkedIn data scientist technical-screen question on member video-posting behavior using two relational tables. It tests joins, first-event derivation with MIN/GROUP BY, conditional grouping for US vs non-US comparison, and distinct-member counts per country with a length filter.
Solution
### Part 1 — Members whose first video was posted on their join date
Compute each member's earliest upload date, then count those whose earliest upload date equals their join date.
```sql
WITH first_video AS (
SELECT
member_id,
MIN(post_date) AS first_video_date
FROM video_posts
GROUP BY member_id
)
SELECT
COUNT(*) AS members_first_video_on_join_date
FROM first_video fv
JOIN members m
ON m.member_id = fv.member_id
WHERE fv.first_video_date = m.join_date;
```
Notes:
- Because we only need the minimum `post_date` per member, no explicit tie-break is required — multiple uploads on the same earliest date collapse to one `first_video_date` via `MIN`/`GROUP BY`.
- A member who has joined but never uploaded a video is correctly excluded, since they have no row in `first_video`. Using a `JOIN` (not a `LEFT JOIN`) keeps the count to members who posted.
### Part 2 — Total videos by US vs Non-US members
Classify each member by region via the country on the `members` table, then sum video uploads per region. Each row in `video_posts` is one upload, so a simple `COUNT(*)` of joined rows gives total uploads.
```sql
SELECT
CASE WHEN m.country = 'usa' THEN 'US' ELSE 'Non-US' END AS member_region,
COUNT(*) AS total_video_posts
FROM video_posts vp
JOIN members m
ON m.member_id = vp.member_id
GROUP BY CASE WHEN m.country = 'usa' THEN 'US' ELSE 'Non-US' END;
```
Notes:
- Join `video_posts` to `members` (rather than the reverse) so each video is counted once and attributed to its poster's region.
- The region label comes from the member's `country`, not from any per-video field, so the `CASE` lives on `m.country`.
- If `country` casing/whitespace is inconsistent in the data, normalize defensively, e.g. `LOWER(TRIM(m.country)) = 'usa'`.
Optional Python (pandas) equivalent:
```python
df = video_posts.merge(members[['member_id', 'country']], on='member_id', how='inner')
df['member_region'] = df['country'].str.lower().str.strip().eq('usa').map({True: 'US', False: 'Non-US'})
result = (df.groupby('member_region')
.size()
.reset_index(name='total_video_posts'))
```
### Part 3 — Distinct members per country with a video longer than 60 seconds
Filter to videos longer than 60 seconds, then count distinct members per country. Count distinct members (not videos) so a member with several long videos is counted once.
```sql
SELECT
m.country,
COUNT(DISTINCT vp.member_id) AS members_with_video_over_60s
FROM video_posts vp
JOIN members m
ON m.member_id = vp.member_id
WHERE vp.video_length_seconds > 60
GROUP BY m.country
ORDER BY members_with_video_over_60s DESC;
```
Notes:
- `> 60` is strict ("longer than 60 seconds"); a video of exactly 60 seconds is excluded. Use `>= 60` only if the intent is "60 seconds or longer."
- `COUNT(DISTINCT vp.member_id)` deduplicates members who posted multiple qualifying videos.
- Countries with no qualifying videos do not appear; if you need a zero row per country, start from `members` and `LEFT JOIN` the filtered videos.
Explanation
All three parts join the same two tables on `member_id`. Part 1 derives each member's first upload via `MIN(post_date)` grouped by member and compares it to `join_date`. Part 2 buckets members into US vs Non-US off `country = 'usa'` and counts video rows (one row = one upload). Part 3 filters to videos strictly longer than 60 seconds and counts distinct members per country. Key rubric points: deduplicate correctly (one first-video date per member; distinct members in part 3), attribute videos to the poster's region in part 2, and use strict `> 60` for the length threshold.