Data Modeling Design Task: Short‑Video Social App Analytics
You are designing an analytics data model for a short‑video social application. The app supports:
-
Users and creators
-
Videos (short clips) with attributes (title, hashtags, language, category)
-
Events: impression, video_play_start, play_progress, play_complete, like, comment, share, follow, save/bookmark
-
Devices (OS, app version), and geo (country/region)
Assume events arrive in near real time from clients and services and land in a raw events stream with possible duplicates and late arrivals.
Required Metrics
Design for the following core metrics (computed daily unless noted):
-
DAU, WAU, MAU (active = user with ≥1 play_start in window)
-
Total video views (play_starts) and unique viewers per video
-
Watch time: average watch time per view; average watch time per user per day
-
Completion rate per video: play_completes / play_starts
-
Engagement rate per video: (likes + comments + shares) / play_starts
-
D1 and D7 user retention
-
Creator follower count and daily follower growth
-
Top‑K videos by watch time in a region for a day
Task
Propose an analytics‑oriented warehouse schema and describe:
-
Tables you would create, each table’s grain (one row = ?), columns (name, data type, nullability), primary keys, foreign keys
-
Partitioning and indexing/clustering strategy (assume a columnar cloud warehouse)
-
How each metric listed above is computed from the schema (include precise formulas or example SQL)
-
Trade‑offs: star vs. snowflake, denormalization choices, wide event table vs. multiple fact tables, SCD handling
-
Data quality: deduplication, late/out‑of‑order data, time zones, PII, referential integrity, bots/fraud, idempotency
State any additional assumptions you need. Keep the scope to core engagement (ignore ads and live streaming for this exercise).