You are given three tables and asked to build a minimal Streamlit app that helps a PM explore weekly cohort retention and ARPU by country. Assume SQL dialect is BigQuery Standard SQL and 'today' is 2025-09-01 UTC.
Schema and tiny sample data:
users
+---------+---------------------+---------+
| user_id | signup_ts | country |
+---------+---------------------+---------+
| 1 | 2025-07-25 10:05:00 | US |
| 2 | 2025-07-26 13:10:00 | IN |
| 3 | 2025-08-01 09:00:00 | US |
| 4 | 2025-08-15 20:30:00 | DE |
events
+----------+---------+---------------------+------------+
| event_id | user_id | event_ts | event_type |
+----------+---------+---------------------+------------+
| e1 | 1 | 2025-08-01 12:00:00 | open |
| e2 | 1 | 2025-08-02 09:05:00 | click |
| e3 | 2 | 2025-08-01 08:00:00 | open |
| e4 | 3 | 2025-08-08 14:00:00 | open |
| e5 | 3 | 2025-08-15 15:00:00 | click |
| e6 | 4 | 2025-08-16 16:00:00 | open |
purchases
+---------+---------+---------------------+--------+
| order_id| user_id | order_ts | amount |
+---------+---------+---------------------+--------+
| o1 | 1 | 2025-08-03 10:00:00 | 19.99 |
| o2 | 3 | 2025-08-16 18:00:00 | 9.99 |
Tasks:
-
Write a single BigQuery query (must use window functions with PARTITION BY) that outputs a cohort-retention heatmap table: columns (signup_week, week_index, country, cohort_size, retained_users, retention_rate) where signup_week = DATE_TRUNC(DATE(signup_ts), WEEK(MONDAY)) and week_index = DATE_DIFF(DATE_TRUNC(DATE(event_ts), WEEK(MONDAY)), signup_week, WEEK). A user is 'retained' in week k if they have ≥1 event that week. Only include weeks where signup_week <= 2025-08-25 and event_ts <= 2025-09-01. Explain how your query avoids double-counting users across weeks and handles users with late events.
-
Extend the SQL to compute weekly ARPU by cohort-country: ARPU = SUM(amount) over users in the cohort with orders whose order_ts falls in the corresponding activity week, divided by cohort_size. Ensure users without orders contribute zero in ARPU but still count in cohort_size. Use appropriate JOINs and window frames.
-
Implement a Streamlit app that:
-
lets the user filter by country (multi-select) and choose metric = {Retention, ARPU};
-
displays a heatmap (for retention) or a line chart by week_index (for ARPU) using Plotly or Altair;
-
treats missing future weeks as NA (not zero) and masks cohorts that are <50 users; and
-
allows a UTC offset selector to re-bucket events (briefly describe how you would precompute or on-the-fly adjust dates to honor timezone without duplicating users across weeks).
-
Briefly describe two data-quality checks you would code into the app (e.g., cohort-size monotonicity vs. observed weeks, guard against clock-skew producing negative week_index).
Answer with the SQL and the Streamlit code structure (high level) plus the timezone strategy.