PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Analytics & Experimentation/Meta

Design analysis to test social vs game engagement

Last updated: Jun 15, 2026

Quick Overview

A Meta (Oculus) data science onsite question on designing a rigorous study to test whether social-feature users are more regularly engaged than game-feature users. It covers metric definition, a randomized experiment, an observational causal-inference fallback (PSM/IPW), a two-proportion power calculation, estimation and inference, robustness/bias checks, and a decision rule.

  • Medium
  • Meta
  • Analytics & Experimentation
  • Data Scientist

Design analysis to test social vs game engagement

Company: Meta

Role: Data Scientist

Category: Analytics & Experimentation

Difficulty: Medium

Interview Round: Onsite

##### Question **Hypothesis:** Among Oculus (Meta Quest) users, those who use *social* features are more regularly engaged than those who use *game* features. Using activity data over an ~8-week window (e.g., 2025-07-01 to 2025-08-31), design a rigorous analysis plan to evaluate this claim. Answer all parts: 1. **Define the outcome.** Define "regularly engaged" precisely (e.g., ≥ 3 active days per week over the 8-week window, or ≥ 10 active days in 28 days). Choose a primary metric and 2–3 guardrail metrics. Justify each choice and propose an analysis-ready metric definition that is robust to outliers and seasonality. 2. **Ideal randomized experiment.** If you can randomize, specify the exact experiment: unit of randomization, treatment (e.g., an onboarding nudge that shifts first-week exposure toward social vs game), primary metric, guardrails, stratification variables, and how you will prevent contamination and novelty effects. State the power/MDE target at α = 0.05. 3. **Observational fallback (causal inference).** If randomization is infeasible, propose an observational design comparing *social-only* vs *game-only* users. Specify inclusion/exclusion criteria (minimum tenure, geography, device), null/alternative hypotheses, and a causal approach (propensity score matching/weighting, IPW, or exact matching on tenure buckets). List the covariates to control (signup date / tenure, baseline engagement, device, country, acquisition channel, content supply, weekday mix) and the diagnostics to validate overlap and balance. 4. **Power / sample size.** Suppose among game-only users the baseline share that is "regularly engaged" is p0 = 0.35. You want to detect a +3 percentage-point absolute lift (MDE = 0.03) with α = 0.05 (two-sided) and power 1 − β = 0.80. Compute the required sample size per group for a two-proportion Z-test and state all formulas and assumptions. 5. **Estimation and inference.** Describe the primary estimator and statistical test (e.g., difference in proportions with cluster-robust SEs if randomization is by user; a two-proportion z-test for the regular-week share; Welch's t-test or Mann–Whitney for mean weekly active days; or a logistic regression with covariates and robust SEs). Explain how you would handle multiple comparisons and interim looks. 6. **Bias and robustness checks.** Detail checks for bias and robustness: pre-trend checks, difference-in-differences on users who switch categories, placebo outcomes, sensitivity analysis for unobserved confounding (e.g., Rosenbaum bounds), and multiple-testing control for secondaries. List at least five concrete threats to validity (reverse causality — more engaged users self-select into social; category misclassification; taxonomy drift; bots / multi-accounts; seasonality; geographic shocks) and how you would detect/mitigate each. 7. **Decision-making and communication.** Define a clear decision rule combining statistical significance, a practical-significance threshold, and guardrails (e.g., p < 0.05 *and* lift ≥ X% with a CI excluding 0). Explain how you would communicate results, risks, and assumptions to product stakeholders, and what follow-up you would run if the effect is heterogeneous across tenure cohorts.

Quick Answer: A Meta (Oculus) data science onsite question on designing a rigorous study to test whether social-feature users are more regularly engaged than game-feature users. It covers metric definition, a randomized experiment, an observational causal-inference fallback (PSM/IPW), a two-proportion power calculation, estimation and inference, robustness/bias checks, and a decision rule.

Solution

**1) Defining "regularly engaged" and metrics** Meta's Oculus / Quest is a VR hardware platform, so "engagement" means recurring headset sessions, not web visits. Define the outcome as a clear, time-windowed binary so it powers a two-proportion test: - **Primary outcome:** `regularly_engaged` = user has ≥ 3 active days/week in ≥ 6 of the 8 weeks (or equivalently ≥ 10 active days in any 28-day sub-window). A binary, windowed definition is robust to single heavy-use days and is the cleanest thing to power. - **Guardrails:** (a) median session length / total minutes (catches a case where "social" inflates day-counts with trivially short sessions), (b) 4-week retention / churn, (c) crash or comfort-related opt-outs (a VR-specific health/safety guardrail). - **Robustness:** winsorize continuous metrics (e.g., minutes) at the 99th percentile; define active days in the user's local timezone; compare like-for-like calendar weeks to neutralize weekday/holiday seasonality; require a minimum tenure so brand-new users' onboarding spike doesn't dominate. **2) Ideal randomized experiment** - **Unit of randomization:** the user (account). User-level randomization avoids cross-user spillover and matches how exposure is delivered. - **Treatment:** an onboarding / home-screen nudge that shifts first-week exposure toward social features (treatment) vs toward games (control) — this manipulates the *category* of early exposure, which is the lever we can actually pull. (You cannot randomize a user's pre-existing preference, so randomize the nudge, not the trait.) - **Primary metric:** `regularly_engaged` measured in weeks 2–8 (post-exposure), to avoid mechanically counting the nudged sessions themselves. - **Stratification:** device generation (Quest 2 vs 3 vs Pro), country/region, and baseline activity tier — stratified randomization tightens the estimate and guarantees balance on the strongest predictors. - **Guardrails:** session length, retention, and comfort opt-outs as above. - **Contamination / novelty:** randomize at the user level and keep a fixed assignment for the whole window to prevent cross-arm leakage; reserve a hold-out and inspect the time series so a short-lived novelty bump isn't mistaken for durable lift; pre-register the analysis to avoid peeking. **3) Observational fallback — causal inference** If you cannot randomize, compare social-only vs game-only users with explicit confounder control. - **Inclusion/exclusion:** minimum tenure (e.g., headset ≥ 30 days before the window so onboarding is excluded), supported regions/locales only, exclude flagged bots / shared family accounts, restrict to active devices. - **Hypotheses:** H0: P(regularly_engaged | social) = P(regularly_engaged | game); H1: P(social) > P(game) (one-sided if directional). - **Causal method:** estimate a propensity for "chooses social" from pre-window covariates — signup_date / tenure, baseline activity in a pre-period, device generation, country, acquisition channel, content library size/supply, weekday mix — then **PSM or IPW**, optionally with **exact matching on tenure buckets**. - **Diagnostics:** check **common support / overlap** (trim non-overlapping propensity regions), verify **covariate balance** after weighting/matching (standardized mean differences < 0.1), and report effective sample size after weighting. **4) Power / sample size (two-proportion Z-test)** For a two-sided two-proportion test with equal groups, p0 = 0.35, p1 = p0 + MDE = 0.38, α = 0.05, power = 0.80: ``` n_per_group = ( z_{1-α/2}·√(2·p̄(1-p̄)) + z_{1-β}·√(p0(1-p0)+p1(1-p1)) )² / (p1-p0)² ``` with z_{0.975} = 1.95996, z_{0.80} = 0.84162, and p̄ = (p0+p1)/2 = 0.365. - p̄(1−p̄) = 0.365·0.635 = 0.231775 → √(2·0.231775) = √0.46355 ≈ 0.68085, times 1.95996 ≈ 1.33444. - p0(1−p0) = 0.35·0.65 = 0.2275; p1(1−p1) = 0.38·0.62 = 0.2356; sum = 0.4631 → √0.4631 ≈ 0.68051, times 0.84162 ≈ 0.57273. - Numerator = (1.33444 + 0.57273)² = (1.90717)² ≈ 3.63730. - Denominator = (0.03)² = 0.0009. - **n_per_group ≈ 3.63730 / 0.0009 ≈ 4,042**, i.e. ~4,050 per arm (~8,100 total). Assumptions: independent users (one observation each), equal allocation, fixed effect size, no interim peeking; if randomizing by user but analyzing user-weeks, inflate by a design effect for clustering. A simpler pooled approximation, n ≈ (z_{1-α/2}+z_{1-β})²·2·p̄(1−p̄)/MDE², gives ~4,015 and is fine as a sanity check. **5) Estimation and inference** - **Primary estimator:** difference in the `regularly_engaged` proportions; test with a two-proportion z-test (or a logistic regression on the treatment indicator). With user-level randomization and one row per user, ordinary SEs suffice; if you analyze repeated user-weeks, use **cluster-robust SEs by user**. - **Continuous secondaries** (mean weekly active days, minutes): **Welch's t-test** (unequal variances) or **Mann–Whitney** if heavily skewed. - **Covariate adjustment** (observational or for precision): logistic regression with the propensity covariates and robust SEs, or IPW-weighted estimation. - **Multiple comparisons:** control the family-wise error or FDR (Bonferroni / Benjamini–Hochberg) across guardrails and secondaries. - **Interim looks:** use alpha-spending (O'Brien–Fleming / Pocock) or sequential testing; don't peek with a fixed α. **6) Bias and robustness** - **Pre-trend checks:** confirm social vs game cohorts had parallel engagement *before* the window (a divergence pre-treatment breaks the causal read). - **Difference-in-differences on switchers:** for users who move between categories, compare before/after, differencing out fixed user traits. - **Placebo outcomes:** test an outcome that should *not* be affected (e.g., account-settings visits); a "significant" placebo effect signals residual confounding. - **Sensitivity analysis:** **Rosenbaum bounds** — how large an unobserved confounder would have to be to overturn the result. - **Threats to validity (≥ 5):** (i) reverse causality — already-engaged users self-select into social, not the reverse; address with the randomized design or pre-period matching. (ii) category misclassification / taxonomy drift — audit the social/game labels and freeze the taxonomy for the window. (iii) bots / multi-accounts / family-shared headsets — filter via device fingerprint and activity heuristics. (iv) seasonality and content shocks (a hit game launch) — align calendar weeks, add time fixed effects, inspect by-week trends. (v) geographic / supply shocks — stratify by region and check robustness excluding affected markets. (vi) novelty effects — measure the durable post-exposure window, not the nudge week. **7) Decision-making and communication** - **Decision rule:** ship/conclude only if the primary effect is **both statistically significant (p < 0.05, CI excluding 0)** *and* **practically significant** (lift ≥ a pre-set threshold, e.g. +3 pp), **and** no guardrail regresses beyond its bound. - **Communication:** lead with the estimated lift and its CI in plain terms, state the design (randomized vs observational) and its key assumptions, and be explicit about residual confounding risk if observational. - **Heterogeneity follow-up:** if the effect varies by tenure cohort (e.g., strong for new users, flat for veterans), report the interaction, avoid over-generalizing the average, and propose a targeted follow-up experiment on the responsive segment.

Explanation

Rubric: a strong answer treats this as a causal question, not a correlational one. It (1) pins "regularly engaged" to a windowed binary that can be powered; (2) proposes randomizing the *exposure nudge* (you can't randomize a pre-existing preference) with user-level assignment and stratification; (3) falls back to PSM/IPW with named confounders, overlap and balance diagnostics; (4) correctly applies the two-proportion sample-size formula (~4,000–4,050/arm here); (5) picks tests that match the data type and controls multiplicity/interim looks; (6) defends against the dominant threat — reverse causality / self-selection — plus misclassification, bots, seasonality, with placebo + sensitivity checks; and (7) sets a decision rule requiring both statistical and practical significance with intact guardrails. Red flags: claiming causation from a raw social-vs-game comparison, ignoring self-selection, or no power math.

Related Interview Questions

  • Measure scheduled posts feature success - Meta (medium)
  • Estimate ads ranking revenue impact - Meta (medium)
  • How should you evaluate unconnected content? - Meta (medium)
  • Should WhatsApp launch group calls? - Meta (medium)
  • How would you grow Meta products? - Meta (medium)
Meta logo
Meta
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Analytics & Experimentation
5
0
Question

Hypothesis: Among Oculus (Meta Quest) users, those who use social features are more regularly engaged than those who use game features. Using activity data over an ~8-week window (e.g., 2025-07-01 to 2025-08-31), design a rigorous analysis plan to evaluate this claim. Answer all parts:

  1. Define the outcome. Define "regularly engaged" precisely (e.g., ≥ 3 active days per week over the 8-week window, or ≥ 10 active days in 28 days). Choose a primary metric and 2–3 guardrail metrics. Justify each choice and propose an analysis-ready metric definition that is robust to outliers and seasonality.
  2. Ideal randomized experiment. If you can randomize, specify the exact experiment: unit of randomization, treatment (e.g., an onboarding nudge that shifts first-week exposure toward social vs game), primary metric, guardrails, stratification variables, and how you will prevent contamination and novelty effects. State the power/MDE target at α = 0.05.
  3. Observational fallback (causal inference). If randomization is infeasible, propose an observational design comparing social-only vs game-only users. Specify inclusion/exclusion criteria (minimum tenure, geography, device), null/alternative hypotheses, and a causal approach (propensity score matching/weighting, IPW, or exact matching on tenure buckets). List the covariates to control (signup date / tenure, baseline engagement, device, country, acquisition channel, content supply, weekday mix) and the diagnostics to validate overlap and balance.
  4. Power / sample size. Suppose among game-only users the baseline share that is "regularly engaged" is p0 = 0.35. You want to detect a +3 percentage-point absolute lift (MDE = 0.03) with α = 0.05 (two-sided) and power 1 − β = 0.80. Compute the required sample size per group for a two-proportion Z-test and state all formulas and assumptions.
  5. Estimation and inference. Describe the primary estimator and statistical test (e.g., difference in proportions with cluster-robust SEs if randomization is by user; a two-proportion z-test for the regular-week share; Welch's t-test or Mann–Whitney for mean weekly active days; or a logistic regression with covariates and robust SEs). Explain how you would handle multiple comparisons and interim looks.
  6. Bias and robustness checks. Detail checks for bias and robustness: pre-trend checks, difference-in-differences on users who switch categories, placebo outcomes, sensitivity analysis for unobserved confounding (e.g., Rosenbaum bounds), and multiple-testing control for secondaries. List at least five concrete threats to validity (reverse causality — more engaged users self-select into social; category misclassification; taxonomy drift; bots / multi-accounts; seasonality; geographic shocks) and how you would detect/mitigate each.
  7. Decision-making and communication. Define a clear decision rule combining statistical significance, a practical-significance threshold, and guardrails (e.g., p < 0.05 and lift ≥ X% with a CI excluding 0). Explain how you would communicate results, risks, and assumptions to product stakeholders, and what follow-up you would run if the effect is heterogeneous across tenure cohorts.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Analytics & Experimentation•More Meta•More Data Scientist•Meta Data Scientist•Meta Analytics & Experimentation•Data Scientist Analytics & Experimentation
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.