Test whether US uploads more videos
Company: LinkedIn
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: easy
Interview Round: Technical Screen
You want to evaluate the hypothesis:
> “US members upload more videos than non‑US members.”
You have:
### Table: `video_posts`
| column | type |
|---|---|
| post_date | DATE |
| memberid | BIGINT |
| video_length | INT |
### Table: `members`
| column | type |
|---|---|
| memberid | BIGINT |
| country | STRING |
| join_date | DATE |
Assume you are analyzing a fixed window (e.g., **2018-12-01 to 2018-12-31**, inclusive), but your approach should generalize.
**Task:** Propose a rigorous analysis plan to answer the question.
Your plan must include:
1. **Metric definitions** (at least one primary metric and at least two diagnostics/guardrails). For example, decide whether to compare:
- total uploads,
- uploads per member,
- uploads per active member,
- uploads per member-day (exposure-adjusted), etc.
2. How you will handle key confounders such as:
- different numbers of members in each group,
- different member tenure (new vs old accounts),
- different activity levels and seasonality.
3. A statistical approach to quantify uncertainty (e.g., confidence intervals, hypothesis tests, regression model), and when you would prefer each.
4. At least one **failure mode** (e.g., Simpson’s paradox from tenure differences) and how you’d detect it.
You may include example SQL/Python pseudocode to compute the metrics, but the focus is on correct experimental/observational analysis design and interpretation.
Quick Answer: This question evaluates a data scientist's competency in observational analytics, metric definition, confounder identification and control, and statistical inference for comparing user-generated content across populations.