Hypothesis Test: Are Users’ First Video Uploads Shorter?
Context
You are given event-level data for a video-sharing product. Each record represents a published video with its duration and uploader. We want to determine whether a user's first uploaded video tends to be shorter than their subsequent uploads.
Assume you have a table like:
-
video_uploads(user_id, video_id, upload_ts, duration_sec, status, visibility, category, source)
Task
Describe how you would test the hypothesis that users’ initial video posts are shorter than their later uploads.
Specify:
-
Data slice (inclusion/exclusion criteria)
-
Metrics and aggregation
-
Statistical test(s) and assumptions
-
High-level SQL or pseudo-code to implement the analysis
Hints:
-
Compare first-video length vs. subsequent average per user.
-
Control for outliers and posting-date distribution (e.g., cohort effects or time trends).