This question evaluates a candidate's competency in cohort definition, time-series event aggregation, use of window functions, cohort-level weighting and bias-aware analytics using SQL and Python.

Ambiguous product question: Are existing users more active than new users over the last 28 days (ending today = 2025-09-01)? 1) Propose two reasonable, mutually exclusive definitions for existing vs new (e.g., by signup date or by prior activity), and two defensible definitions of active (e.g., DAU, sessions/week). Briefly state pros/cons and pick one pair to implement. 2) Using the schema and sample data below, write SQL that: a) labels users as new or existing; b) computes each cohort's 7-day rolling active rate and average daily events/user over the last 28 days; c) adjusts for partial observation windows for users who signed up within the window; d) produces a final table with date, cohort, active_users, total_users_observed, active_rate_7d, avg_events_per_user. Use window functions (e.g., partition by user, rolling windows) and avoid double-counting users across cohorts. 3) Extend your query to stratify by country and then produce a cohort-level weighted average controlling for country mix. 4) Briefly note two bias risks (e.g., survivorship, seasonality) and one SQL-side mitigation you implemented. Schema (you may add a small date calendar CTE if needed): users(user_id INT, signup_date DATE, country STRING) events(user_id INT, event_date DATE, event_type STRING) Sample rows: users user_id | signup_date | country 1 | 2025-08-15 | US 2 | 2025-06-10 | US 3 | 2025-08-30 | CA 4 | 2025-07-01 | IN 5 | 2025-08-20 | US
events user_id | event_date | event_type 1 | 2025-08-29 | view 1 | 2025-09-01 | message 2 | 2025-08-25 | like 2 | 2025-08-31 | view 3 | 2025-09-01 | view 4 | 2025-08-28 | view 5 | 2025-08-31 | comment