Ambiguous product question: Are existing users more active than new users over the last 28 days (ending today = 2025-09-01)? 1) Propose two reasonable, mutually exclusive definitions for existing vs new (e.g., by signup date or by prior activity), and two defensible definitions of active (e.g., DAU, sessions/week). Briefly state pros/cons and pick one pair to implement. 2) Using the schema and sample data below, write SQL that: a) labels users as new or existing; b) computes each cohort's 7-day rolling active rate and average daily events/user over the last 28 days; c) adjusts for partial observation windows for users who signed up within the window; d) produces a final table with date, cohort, active_users, total_users_observed, active_rate_7d, avg_events_per_user. Use window functions (e.g., partition by user, rolling windows) and avoid double-counting users across cohorts. 3) Extend your query to stratify by country and then produce a cohort-level weighted average controlling for country mix. 4) Briefly note two bias risks (e.g., survivorship, seasonality) and one SQL-side mitigation you implemented.
Schema (you may add a small date calendar CTE if needed):
users(user_id INT, signup_date DATE, country STRING)
events(user_id INT, event_date DATE, event_type STRING)
Sample rows:
users
user_id | signup_date | country
1 | 2025-08-15 | US
2 | 2025-06-10 | US
3 | 2025-08-30 | CA
4 | 2025-07-01 | IN
5 | 2025-08-20 | US
events
user_id | event_date | event_type
1 | 2025-08-29 | view
1 | 2025-09-01 | message
2 | 2025-08-25 | like
2 | 2025-08-31 | view
3 | 2025-09-01 | view
4 | 2025-08-28 | view
5 | 2025-08-31 | comment