Hypothesis: Users who use the 'social' category are more regularly engaged than users who use the 'game' category. Using data from 2025-08-04 to 2025-09-01, design a rigorous analysis plan to evaluate this claim. Answer all parts:
-
Define 'regularly engaged' precisely (e.g., ≥10 active days in 28 days) and choose a primary metric and 2–3 guardrail metrics. Justify each choice and propose an analysis-ready metric definition that is robust to outliers and seasonality.
-
Recommend an ideal randomized experiment to test the hypothesis (e.g., onboarding nudge that shifts first-week exposure to social vs game). Describe randomization unit, stratification variables, primary analysis, guardrails, and how you will prevent contamination and novelty effects.
-
If randomization is infeasible and you must use observational data, propose a causal inference approach (e.g., propensity score weighting or matching) specifying covariates to control (tenure, device, country, acquisition channel, baseline activity, weekday mix, etc.), diagnostics to validate overlap/balance, and a sensitivity analysis for unobserved confounding.
-
Powering: Suppose in 'game_only' users the baseline share that is 'regularly engaged' is p0 = 0.35. You want to detect a +3 percentage point absolute lift (MDE = 0.03) with α = 0.05 (two-sided) and 1−β = 0.80. Compute the required sample size per group for a two-proportion Z-test and state all formulas/assumptions.
-
Estimation and inference: Describe the primary estimator and statistical test you will use (e.g., difference in proportions with cluster-robust SEs if randomization is by user; or a logistic regression with covariates and robust SEs). Explain how you would handle multiple comparisons and interim looks.
-
Threats to validity: List at least five concrete risks (e.g., reverse causality—more engaged users choose social; misclassification of category; taxonomy drift; bots/multi-accounts; seasonality; geographic shocks) and how you would detect/mitigate each.
-
Decision-making: Define a clear decision rule using the primary metric, practical significance threshold, and guardrails. Include how you would communicate results to product and what follow-up you would run if the effect is heterogeneous across tenure cohorts.