Short-Video Platform: View Distribution and Recommendation Overlap
Context
You are analyzing a short-video platform. You have:
-
A dataset of per-video view counts over a fixed time window (e.g., last 30 days).
-
Two users whose top-10 recommended videos (or top-10 consumed videos) frequently include identical items.
Assume view counts are nonnegative integers and video identity is deduplicated (e.g., by content hash, not just URL) to avoid counting re-uploads separately.
Tasks
-
Distribution of video-level views
-
Describe how you would visualize the distribution of views per video.
-
Report the mode, median, mean, and 99th percentile of the distribution.
-
Overlap in top-10 videos between two users
-
Statistically evaluate whether frequent overlap in two users' top-10 lists is desirable or a potential problem.
-
Explicitly consider heavy-tail effects (Zipf-like distributions), and discuss trade-offs between diversity and homogeneity.
Hints
-
Expect a heavy-tailed, long-tail distribution (often Zipf/Pareto-like).
-
Weigh personalization and diversity against the benefits of showing trending, high-quality content.