##### Question **Hypothesis:** Among Oculus (Meta Quest) users, those who use *social* features are more regularly engaged than those who use *game* features. Using activity data over an ~8-week window (e.g., 2025-07-01 to 2025-08-31), design a rigorous analysis plan to evaluate this claim. Answer all parts: 1. **Define the outcome.** Define "regularly engaged" precisely (e.g., ≥ 3 active days per week over the 8-week window, or ≥ 10 active days in 28 days). Choose a primary metric and 2–3 guardrail metrics. Justify each choice and propose an analysis-ready metric definition that is robust to outliers and seasonality. 2. **Ideal randomized experiment.** If you can randomize, specify the exact experiment: unit of randomization, treatment (e.g., an onboarding nudge that shifts first-week exposure toward social vs game), primary metric, guardrails, stratification variables, and how you will prevent contamination and novelty effects. State the power/MDE target at α = 0.05. 3. **Observational fallback (causal inference).** If randomization is infeasible, propose an observational design comparing *social-only* vs *game-only* users. Specify inclusion/exclusion criteria (minimum tenure, geography, device), null/alternative hypotheses, and a causal approach (propensity score matching/weighting, IPW, or exact matching on tenure buckets). List the covariates to control (signup date / tenure, baseline engagement, device, country, acquisition channel, content supply, weekday mix) and the diagnostics to validate overlap and balance. 4. **Power / sample size.** Suppose among game-only users the baseline share that is "regularly engaged" is p0 = 0.35. You want to detect a +3 percentage-point absolute lift (MDE = 0.03) with α = 0.05 (two-sided) and power 1 − β = 0.80. Compute the required sample size per group for a two-proportion Z-test and state all formulas and assumptions. 5. **Estimation and inference.** Describe the primary estimator and statistical test (e.g., difference in proportions with cluster-robust SEs if randomization is by user; a two-proportion z-test for the regular-week share; Welch's t-test or Mann–Whitney for mean weekly active days; or a logistic regression with covariates and robust SEs). Explain how you would handle multiple comparisons and interim looks. 6. **Bias and robustness checks.** Detail checks for bias and robustness: pre-trend checks, difference-in-differences on users who switch categories, placebo outcomes, sensitivity analysis for unobserved confounding (e.g., Rosenbaum bounds), and multiple-testing control for secondaries. List at least five concrete threats to validity (reverse causality — more engaged users self-select into social; category misclassification; taxonomy drift; bots / multi-accounts; seasonality; geographic shocks) and how you would detect/mitigate each. 7. **Decision-making and communication.** Define a clear decision rule combining statistical significance, a practical-significance threshold, and guardrails (e.g., p < 0.05 *and* lift ≥ X% with a CI excluding 0). Explain how you would communicate results, risks, and assumptions to product stakeholders, and what follow-up you would run if the effect is heterogeneous across tenure cohorts.

**1) Defining "regularly engaged" and metrics** Meta's Oculus / Quest is a VR hardware platform, so "engagement" means recurring headset sessions, not web visits. Define the outcome as a clear, time-windowed binary so it powers a two-proportion test: - **Primary outcome:** `regularly_engaged` = user has ≥ 3 active days/week in ≥ 6 of the 8 weeks (or equivalently ≥ 10 active days in any 28-day sub-window). A binary, windowed definition is robust to single heavy-use days and is the cleanest thing to power. - **Guardrails:** (a) median session length / total minutes (catches a case where "social" inflates day-counts with trivially short sessions), (b) 4-week retention / churn, (c) crash or comfort-related opt-outs (a VR-specific health/safety guardrail). - **Robustness:** winsorize continuous metrics (e.g., minutes) at the 99th percentile; define active days in the user's local timezone; compare like-for-like calendar weeks to neutralize weekday/holiday seasonality; require a minimum tenure so brand-new users' onboarding spike doesn't dominate. **2) Ideal randomized experiment** - **Unit of randomization:** the user (account). User-level randomization avoids cross-user spillover and matches how exposure is delivered. - **Treatment:** an onboarding / home-screen nudge that shifts first-week exposure toward social features (treatment) vs toward games (control) — this manipulates the *category* of early exposure, which is the lever we can actually pull. (You cannot randomize a user's pre-existing preference, so randomize the nudge, not the trait.) - **Primary metric:** `regularly_engaged` measured in weeks 2–8 (post-exposure), to avoid mechanically counting the nudged sessions themselves. - **Stratification:** device generation (Quest 2 vs 3 vs Pro), country/region, and baseline activity tier — stratified randomization tightens the estimate and guarantees balance on the strongest predictors. - **Guardrails:** session length, retention, and comfort opt-outs as above. - **Contamination / novelty:** randomize at the user level and keep a fixed assignment for the whole window to prevent cross-arm leakage; reserve a hold-out and inspect the time series so a short-lived novelty bump isn't mistaken for durable lift; pre-register the analysis to avoid peeking. **3) Observational fallback — causal inference** If you cannot randomize, compare social-only vs game-only users with explicit confounder control. - **Inclusion/exclusion:** minimum tenure (e.g., headset ≥ 30 days before the window so onboarding is excluded), supported regions/locales only, exclude flagged bots / shared family accounts, restrict to active devices. - **Hypotheses:** H0: P(regularly_engaged | social) = P(regularly_engaged | game); H1: P(social) > P(game) (one-sided if directional). - **Causal method:** estimate a propensity for "chooses social" from pre-window covariates — signup_date / tenure, baseline activity in a pre-period, device generation, country, acquisition channel, content library size/supply, weekday mix — then **PSM or IPW**, optionally with **exact matching on tenure buckets**. - **Diagnostics:** check **common support / overlap** (trim non-overlapping propensity regions), verify **covariate balance** after weighting/matching (standardized mean differences < 0.1), and report effective sample size after weighting. **4) Power / sample size (two-proportion Z-test)** For a two-sided two-proportion test with equal groups, p0 = 0.35, p1 = p0 + MDE = 0.38, α = 0.05, power = 0.80: ``` n_per_group = ( z_{1-α/2}·√(2·p̄(1-p̄)) + z_{1-β}·√(p0(1-p0)+p1(1-p1)) )² / (p1-p0)² ``` with z_{0.975} = 1.95996, z_{0.80} = 0.84162, and p̄ = (p0+p1)/2 = 0.365. - p̄(1−p̄) = 0.365·0.635 = 0.231775 → √(2·0.231775) = √0.46355 ≈ 0.68085, times 1.95996 ≈ 1.33444. - p0(1−p0) = 0.35·0.65 = 0.2275; p1(1−p1) = 0.38·0.62 = 0.2356; sum = 0.4631 → √0.4631 ≈ 0.68051, times 0.84162 ≈ 0.57273. - Numerator = (1.33444 + 0.57273)² = (1.90717)² ≈ 3.63730. - Denominator = (0.03)² = 0.0009. - **n_per_group ≈ 3.63730 / 0.0009 ≈ 4,042**, i.e. ~4,050 per arm (~8,100 total). Assumptions: independent users (one observation each), equal allocation, fixed effect size, no interim peeking; if randomizing by user but analyzing user-weeks, inflate by a design effect for clustering. A simpler pooled approximation, n ≈ (z_{1-α/2}+z_{1-β})²·2·p̄(1−p̄)/MDE², gives ~4,015 and is fine as a sanity check. **5) Estimation and inference** - **Primary estimator:** difference in the `regularly_engaged` proportions; test with a two-proportion z-test (or a logistic regression on the treatment indicator). With user-level randomization and one row per user, ordinary SEs suffice; if you analyze repeated user-weeks, use **cluster-robust SEs by user**. - **Continuous secondaries** (mean weekly active days, minutes): **Welch's t-test** (unequal variances) or **Mann–Whitney** if heavily skewed. - **Covariate adjustment** (observational or for precision): logistic regression with the propensity cov

How do I approach Analytics & Experimentation interview questions?

Analytics & Experimentation questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master analytics & experimentation interviews.

What difficulty level is this interview question?

This is a Medium difficulty Analytics & Experimentation question, commonly asked during Onsite rounds at Meta.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Meta during technical interviews.

Design analysis to test social vs game engagement

Q: Design analysis to test social vs game engagement

A Meta (Oculus) data science onsite question on designing a rigorous study to test whether social-feature users are more regularly engaged than game-feature users. It covers metric definition, a randomized experiment, an observational causal-inference fallback (PSM/IPW), a two-proportion power calculation, estimation and inference, robustness/bias checks, and a decision rule.

Q: How do I approach Analytics & Experimentation interview questions?

Analytics & Experimentation questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master analytics & experimentation interviews.

Q: What difficulty level is this interview question?

This is a Medium difficulty Analytics & Experimentation question, commonly asked during Onsite rounds at Meta.

Q: What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Meta during technical interviews.

Question

Hypothesis: Among Oculus (Meta Quest) users, those who use social features are more regularly engaged than those who use game features. Using activity data over an ~8-week window (e.g., 2025-07-01 to 2025-08-31), design a rigorous analysis plan to evaluate this claim. Answer all parts:

Define the outcome. Define "regularly engaged" precisely (e.g., ≥ 3 active days per week over the 8-week window, or ≥ 10 active days in 28 days). Choose a primary metric and 2–3 guardrail metrics. Justify each choice and propose an analysis-ready metric definition that is robust to outliers and seasonality.
Ideal randomized experiment. If you can randomize, specify the exact experiment: unit of randomization, treatment (e.g., an onboarding nudge that shifts first-week exposure toward social vs game), primary metric, guardrails, stratification variables, and how you will prevent contamination and novelty effects. State the power/MDE target at α = 0.05.
Observational fallback (causal inference). If randomization is infeasible, propose an observational design comparing social-only vs game-only users. Specify inclusion/exclusion criteria (minimum tenure, geography, device), null/alternative hypotheses, and a causal approach (propensity score matching/weighting, IPW, or exact matching on tenure buckets). List the covariates to control (signup date / tenure, baseline engagement, device, country, acquisition channel, content supply, weekday mix) and the diagnostics to validate overlap and balance.
Power / sample size. Suppose among game-only users the baseline share that is "regularly engaged" is p0 = 0.35. You want to detect a +3 percentage-point absolute lift (MDE = 0.03) with α = 0.05 (two-sided) and power 1 − β = 0.80. Compute the required sample size per group for a two-proportion Z-test and state all formulas and assumptions.
Estimation and inference. Describe the primary estimator and statistical test (e.g., difference in proportions with cluster-robust SEs if randomization is by user; a two-proportion z-test for the regular-week share; Welch's t-test or Mann–Whitney for mean weekly active days; or a logistic regression with covariates and robust SEs). Explain how you would handle multiple comparisons and interim looks.
Bias and robustness checks. Detail checks for bias and robustness: pre-trend checks, difference-in-differences on users who switch categories, placebo outcomes, sensitivity analysis for unobserved confounding (e.g., Rosenbaum bounds), and multiple-testing control for secondaries. List at least five concrete threats to validity (reverse causality — more engaged users self-select into social; category misclassification; taxonomy drift; bots / multi-accounts; seasonality; geographic shocks) and how you would detect/mitigate each.
Decision-making and communication. Define a clear decision rule combining statistical significance, a practical-significance threshold, and guardrails (e.g., p < 0.05 and lift ≥ X% with a CI excluding 0). Explain how you would communicate results, risks, and assumptions to product stakeholders, and what follow-up you would run if the effect is heterogeneous across tenure cohorts.

Question

Define the outcome. Define "regularly engaged" precisely (e.g., ≥ 3 active days per week over the 8-week window, or ≥ 10 active days in 28 days). Choose a primary metric and 2–3 guardrail metrics. Justify each choice and propose an analysis-ready metric definition that is robust to outliers and seasonality.
Ideal randomized experiment. If you can randomize, specify the exact experiment: unit of randomization, treatment (e.g., an onboarding nudge that shifts first-week exposure toward social vs game), primary metric, guardrails, stratification variables, and how you will prevent contamination and novelty effects. State the power/MDE target at α = 0.05.
Observational fallback (causal inference). If randomization is infeasible, propose an observational design comparing social-only vs game-only users. Specify inclusion/exclusion criteria (minimum tenure, geography, device), null/alternative hypotheses, and a causal approach (propensity score matching/weighting, IPW, or exact matching on tenure buckets). List the covariates to control (signup date / tenure, baseline engagement, device, country, acquisition channel, content supply, weekday mix) and the diagnostics to validate overlap and balance.
Power / sample size. Suppose among game-only users the baseline share that is "regularly engaged" is p0 = 0.35. You want to detect a +3 percentage-point absolute lift (MDE = 0.03) with α = 0.05 (two-sided) and power 1 − β = 0.80. Compute the required sample size per group for a two-proportion Z-test and state all formulas and assumptions.
Estimation and inference. Describe the primary estimator and statistical test (e.g., difference in proportions with cluster-robust SEs if randomization is by user; a two-proportion z-test for the regular-week share; Welch's t-test or Mann–Whitney for mean weekly active days; or a logistic regression with covariates and robust SEs). Explain how you would handle multiple comparisons and interim looks.
Bias and robustness checks. Detail checks for bias and robustness: pre-trend checks, difference-in-differences on users who switch categories, placebo outcomes, sensitivity analysis for unobserved confounding (e.g., Rosenbaum bounds), and multiple-testing control for secondaries. List at least five concrete threats to validity (reverse causality — more engaged users self-select into social; category misclassification; taxonomy drift; bots / multi-accounts; seasonality; geographic shocks) and how you would detect/mitigate each.
Decision-making and communication. Define a clear decision rule combining statistical significance, a practical-significance threshold, and guardrails (e.g., p < 0.05 and lift ≥ X% with a CI excluding 0). Explain how you would communicate results, risks, and assumptions to product stakeholders, and what follow-up you would run if the effect is heterogeneous across tenure cohorts.

Design analysis to test social vs game engagement

Quick Overview

Question

Solution

Submit Your Answer to Earn 20XP

Design analysis to test social vs game engagement

Quick Overview

Question

Solution

Submit Your Answer to Earn 20XP

Design analysis to test social vs game engagement

Quick Overview