This question evaluates a data scientist's competency in observational analytics, metric definition, confounder identification and control, and statistical inference for comparing user-generated content across populations.
You want to evaluate the hypothesis:
“US members upload more videos than non‑US members.”
You have:
video_posts| column | type |
|---|---|
| post_date | DATE |
| memberid | BIGINT |
| video_length | INT |
members| column | type |
|---|---|
| memberid | BIGINT |
| country | STRING |
| join_date | DATE |
Assume you are analyzing a fixed window (e.g., 2018-12-01 to 2018-12-31, inclusive), but your approach should generalize.
Task: Propose a rigorous analysis plan to answer the question.
Your plan must include:
You may include example SQL/Python pseudocode to compute the metrics, but the focus is on correct experimental/observational analysis design and interpretation.