Build a cohort dashboard with Streamlit and SQL

Q: Build a cohort dashboard with Streamlit and SQL

This is a Data Manipulation (SQL/Python) interview question from Snowflake for Data Scientist roles. View the full question and solution on PracHub.

Q: How do I approach Data Manipulation (SQL/Python) interview questions?

Data Manipulation (SQL/Python) questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master data manipulation (sql/python) interviews.

Question

You are given three tables and asked to build a minimal Streamlit app that helps a PM explore weekly cohort retention and ARPU by country. Assume SQL dialect is BigQuery Standard SQL and 'today' is 2025-09-01 UTC.

Schema and tiny sample data:

users +---------+---------------------+---------+ | user_id | signup_ts | country | +---------+---------------------+---------+ | 1 | 2025-07-25 10:05:00 | US | | 2 | 2025-07-26 13:10:00 | IN | | 3 | 2025-08-01 09:00:00 | US | | 4 | 2025-08-15 20:30:00 | DE |

events +----------+---------+---------------------+------------+ | event_id | user_id | event_ts | event_type | +----------+---------+---------------------+------------+ | e1 | 1 | 2025-08-01 12:00:00 | open | | e2 | 1 | 2025-08-02 09:05:00 | click | | e3 | 2 | 2025-08-01 08:00:00 | open | | e4 | 3 | 2025-08-08 14:00:00 | open | | e5 | 3 | 2025-08-15 15:00:00 | click | | e6 | 4 | 2025-08-16 16:00:00 | open |

purchases +---------+---------+---------------------+--------+ | order_id| user_id | order_ts | amount | +---------+---------+---------------------+--------+ | o1 | 1 | 2025-08-03 10:00:00 | 19.99 | | o2 | 3 | 2025-08-16 18:00:00 | 9.99 |

Tasks:

Write a single BigQuery query (must use window functions with PARTITION BY) that outputs a cohort-retention heatmap table: columns (signup_week, week_index, country, cohort_size, retained_users, retention_rate) where signup_week = DATE_TRUNC(DATE(signup_ts), WEEK(MONDAY)) and week_index = DATE_DIFF(DATE_TRUNC(DATE(event_ts), WEEK(MONDAY)), signup_week, WEEK). A user is 'retained' in week k if they have ≥1 event that week. Only include weeks where signup_week <= 2025-08-25 and event_ts <= 2025-09-01. Explain how your query avoids double-counting users across weeks and handles users with late events.
Extend the SQL to compute weekly ARPU by cohort-country: ARPU = SUM(amount) over users in the cohort with orders whose order_ts falls in the corresponding activity week, divided by cohort_size. Ensure users without orders contribute zero in ARPU but still count in cohort_size. Use appropriate JOINs and window frames.
Implement a Streamlit app that:

lets the user filter by country (multi-select) and choose metric = {Retention, ARPU};
displays a heatmap (for retention) or a line chart by week_index (for ARPU) using Plotly or Altair;
treats missing future weeks as NA (not zero) and masks cohorts that are <50 users; and
allows a UTC offset selector to re-bucket events (briefly describe how you would precompute or on-the-fly adjust dates to honor timezone without duplicating users across weeks).

Briefly describe two data-quality checks you would code into the app (e.g., cohort-size monotonicity vs. observed weeks, guard against clock-skew producing negative week_index). Answer with the SQL and the Streamlit code structure (high level) plus the timezone strategy.

Build a cohort dashboard with Streamlit and SQL

Comments (0)