This question evaluates data engineering skills in analytical data modeling, incremental ETL, time-series aggregation, and KPI computation for large-scale systems.
You need to design the analytical data model behind a KPI dashboard for a large-scale photo-sharing app.
Raw source tables:
users(user_id, user_name, joined_at)
photos(photo_id, user_id, created_at)
follows(follower_user_id, followed_user_id, created_at)
user_events(event_id, user_id, event_type, occurred_at)
Assume all timestamps are stored in UTC. A follow row means a follow relationship was created at created_at and remains active afterward. DAU means distinct active users per local calendar day, based on user_events.
The dashboard is initially reported daily and must stay fast even when the raw dataset becomes very large. The required metrics are:
Design the table or tables that should power this dashboard. Explain the grain, keys, what should be precomputed versus derived, and how you would populate the tables incrementally from raw data.
Then discuss the design changes needed for each of these follow-ups:
America/Los_Angeles
to
America/New_York
.
Focus on correctness, performance, and maintainability.