This question evaluates proficiency in SQL-based data engineering, focusing on time-zone-aware timestamp handling, aggregation of event and relationship data, cohort retention analysis, and computation of product metrics like DAU, new users, photo activity, and follower distributions.
You are given a simplified schema for a photo-sharing app:
users(user_id, user_name, joined_at)
photos(photo_id, user_id, created_at)
follows(follower_user_id, followed_user_id, created_at)
user_events(event_id, user_id, event_type, occurred_at)
Assume all timestamps are stored in UTC. Unless otherwise stated, reporting should use the America/Los_Angeles timezone. A row in follows represents a follow created at created_at and still active afterward. DAU means the number of distinct users with at least one row in user_events on that local calendar day.
Also assume the dashboard is backed by these aggregate tables:
kpi_daily_base(report_date, new_users, total_users_eod, dau, new_photos, total_follow_edges_eod)
follower_bucket_daily(report_date, bucket, user_count)
where
bucket
is one of
0
,
1_2
, or
3_plus
retention_cohort_daily(cohort_date, cohort_size, retained_w1_users)
Write SQL for the following tasks:
D
, the share who had at least one event on days
D+7
through
D+13
in the reporting timezone.