You need to design the analytical data model behind a KPI dashboard for a large-scale photo-sharing app.
Raw source tables:
-
users(user_id, user_name, joined_at)
-
photos(photo_id, user_id, created_at)
-
follows(follower_user_id, followed_user_id, created_at)
-
user_events(event_id, user_id, event_type, occurred_at)
Assume all timestamps are stored in UTC. A follow row means a follow relationship was created at created_at and remains active afterward. DAU means distinct active users per local calendar day, based on user_events.
The dashboard is initially reported daily and must stay fast even when the raw dataset becomes very large. The required metrics are:
-
new users
-
change versus 7 days earlier
-
DAU
-
change versus 7 days earlier
-
new photos
-
new photos per DAU
-
change versus 7 days earlier
-
total followers
-
% users with 0 followers
-
% users with 1-2 followers
-
% users with 3+ followers
Design the table or tables that should power this dashboard. Explain the grain, keys, what should be precomputed versus derived, and how you would populate the tables incrementally from raw data.
Then discuss the design changes needed for each of these follow-ups:
-
Add a week-1 retention metric.
-
Support weekly rollups instead of daily rollups.
-
Change reporting from
America/Los_Angeles
to
America/New_York
.
Focus on correctness, performance, and maintainability.