PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Analytics & Experimentation/Meta

Prove friends outperform unconnected; design metrics, observational analysis, and rollout experiment

Last updated: Jun 15, 2026

Quick Overview

A Meta data-scientist technical-screen question on Analytics & Experimentation: using only info_stream_views and post_reactions, prove whether Friend-authored content is 'more social' than Unconnected content. It tests denominator-complete metric design with relationship attribution, observational causal validation (fixed effects + propensity/AIPW), and a network-aware rollout experiment with power, CUPED, interference mitigations, and guardrails — plus quantifying the long-term and discovery value of Unconnected content beyond near-term engagement.

  • hard
  • Meta
  • Analytics & Experimentation
  • Data Scientist

Prove friends outperform unconnected; design metrics, observational analysis, and rollout experiment

Company: Meta

Role: Data Scientist

Category: Analytics & Experimentation

Difficulty: hard

Interview Round: Technical Screen

##### Question You are given two event tables, `info_stream_views` (one row per viewer–post view, with `viewer_id`, `post_id`, `relationship` ∈ {friend, unconnected}, `view_duration_ms`, `event_ts`, `ds`) and `post_reactions` (one row per reaction, with `reactor_id`, `post_id`, reaction type ∈ {like, comment, reshare, follow, hide, report}, `event_ts`, `ds`). You hypothesize that **content authored by Friends is "more social" than content from Unconnected sources** (i.e., it drives more likes/comments/reshares per view). Using only these two tables, design a rigorous, end-to-end analysis: define metrics, validate the hypothesis observationally, design an experiment for launching/expanding Unconnected content, and quantify the value of Unconnected exposure even when its near-term engagement is lower. 1. **Define and justify metrics (formulas welcome).** - Precisely define "more social" with measurable, denominator-complete metrics: e.g., social-reactions per DAU, reactions per impression by relationship, reaction-rate per view, comment/reshare rate per 100 (or 1,000) impressions, dwell-time lift (share of views ≥ 60s), and same-day view-to-reaction conversion. Consider a weighted composite (e.g., weight comments/reshares above likes) and justify the weights. - State the **unit of analysis** (viewer–post–day vs. post–day vs. impression), whether to include zero-reaction views, and how the choice affects bias. - Specify **attribution**: join each reaction to the view by `(viewer_id, post_id)` (and `ds`), attribute it to the `relationship` in which the viewer saw the post, handle multiple views of the same post per viewer per day (e.g., aggregate to MAX duration / one impression), define a lookback window for reactions lacking a same-day matched view, and report the unattributed-reaction rate as a QA metric. - Propose **normalization** (per impression, per unique viewer, per minute viewed) and **guardrail metrics**: daily active viewers, views/session, creator/topic diversity (unique authors per viewer–day, entropy), and quality guardrails (hide rate, report rate, negative share of reactions). 2. **Observational validation.** Outline an analysis to compare Friend vs. Unconnected engagement while mitigating confounding over a fixed window (e.g., a 7-day window such as 2025-08-26..2025-09-01). - List key confounders (viewer propensity to engage, author popularity, post age/freshness, content type via proxies like view duration, time-of-day, device, rank position) and control for them. - Propose a primary design — e.g., a **fixed-effects regression** with viewer×day fixed effects (and optionally author fixed effects) — and a secondary design (**propensity-score matching / inverse-propensity weighting**, or a **doubly-robust AIPW** estimator). State the unit of analysis, covariates, and outcome window (e.g., reaction within 24h of first view). - Specify standard-error treatment (cluster by viewer and/or post), multiple-comparison control (one pre-specified primary metric; FDR on secondaries), and diagnostics (overlap/common support, post-adjustment covariate balance, placebo using hide/report outcomes, robustness across post-age buckets). 3. **Experiment design — launching/expanding Unconnected content.** Propose a randomized experiment to measure success. - Randomization unit (user-level, sticky); treatment variants — either reserve a share of feed slots for Unconnected content (e.g., 0% / 10% / 30%) or scale the relationship ranking weight (e.g., θ = 1.0 control vs. θ = 0.8 to upweight Unconnected). Define primary outcomes (net social reactions per DAU, reactions per session, time-to-first-friend-interaction), guardrails (retention/D+1 return, session length, hide/report rates, creator follows, friend-ecosystem health, long-term re-engagement), and minimal acceptable lifts. - Provide a **power/duration check** (baseline rate, MDE, α, power, clustering design effect) and a **variance-reduction plan** (CUPED with pre-exposure per-user baseline; diff-in-diff for long-run panels). - Address **novelty effects, personalization/learning ramp, supply constraints** (log intended vs. achieved Unconnected share; ITT + exposure-on-treated), and **network/peer interference** (graph-cluster randomization, supply/author holdouts, or interleaved time-split ramps). - Include **segment / heterogeneity analyses** (new vs. power users; friend-graph density; region; consumption-style deciles) with multiple-testing control, and a clear ship/iterate decision framework. 4. **Value of Unconnected content beyond near-term engagement.** Even if immediate engagement is lower, define and measure the incremental value of Unconnected exposure using only the given tables (and call out what extra logs you'd request): - **Discovery value**: new viewer–author pair rate, repeat-return-to-author rate, creator breadth and topical entropy per viewer–day. - **Long-term value**: next-day (D+1) and 7-day retention, session depth. - **Amplification**: reshare-driven downstream reach (incremental views following a reshare). Present a trade-off view (per-1,000-impression KPIs) so a decision-maker can weigh near-term engagement against discovery and long-term value. **Deliverables:** (a) a metric spec with formulas; (b) an observational analysis plan with controls and diagnostics; (c) an experiment-design doc with randomization unit, power inputs, interference mitigations, and stopping rules; (d) KPIs quantifying the incremental value of Unconnected content even when near-term engagement is lower.

Quick Answer: A Meta data-scientist technical-screen question on Analytics & Experimentation: using only info_stream_views and post_reactions, prove whether Friend-authored content is 'more social' than Unconnected content. It tests denominator-complete metric design with relationship attribution, observational causal validation (fixed effects + propensity/AIPW), and a network-aware rollout experiment with power, CUPED, interference mitigations, and guardrails — plus quantifying the long-term and discovery value of Unconnected content beyond near-term engagement.

Related Interview Questions

  • Measure scheduled posts feature success - Meta (medium)
  • Estimate ads ranking revenue impact - Meta (medium)
  • How should you evaluate unconnected content? - Meta (medium)
  • Should WhatsApp launch group calls? - Meta (medium)
  • How would you grow Meta products? - Meta (medium)
Meta logo
Meta
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Analytics & Experimentation
2
0
Question

You are given two event tables, info_stream_views (one row per viewer–post view, with viewer_id, post_id, relationship ∈ {friend, unconnected}, view_duration_ms, event_ts, ds) and post_reactions (one row per reaction, with reactor_id, post_id, reaction type ∈ {like, comment, reshare, follow, hide, report}, event_ts, ds).

You hypothesize that content authored by Friends is "more social" than content from Unconnected sources (i.e., it drives more likes/comments/reshares per view). Using only these two tables, design a rigorous, end-to-end analysis: define metrics, validate the hypothesis observationally, design an experiment for launching/expanding Unconnected content, and quantify the value of Unconnected exposure even when its near-term engagement is lower.

  1. Define and justify metrics (formulas welcome).
    • Precisely define "more social" with measurable, denominator-complete metrics: e.g., social-reactions per DAU, reactions per impression by relationship, reaction-rate per view, comment/reshare rate per 100 (or 1,000) impressions, dwell-time lift (share of views ≥ 60s), and same-day view-to-reaction conversion. Consider a weighted composite (e.g., weight comments/reshares above likes) and justify the weights.
    • State the unit of analysis (viewer–post–day vs. post–day vs. impression), whether to include zero-reaction views, and how the choice affects bias.
    • Specify attribution : join each reaction to the view by (viewer_id, post_id) (and ds ), attribute it to the relationship in which the viewer saw the post, handle multiple views of the same post per viewer per day (e.g., aggregate to MAX duration / one impression), define a lookback window for reactions lacking a same-day matched view, and report the unattributed-reaction rate as a QA metric.
    • Propose normalization (per impression, per unique viewer, per minute viewed) and guardrail metrics : daily active viewers, views/session, creator/topic diversity (unique authors per viewer–day, entropy), and quality guardrails (hide rate, report rate, negative share of reactions).
  2. Observational validation. Outline an analysis to compare Friend vs. Unconnected engagement while mitigating confounding over a fixed window (e.g., a 7-day window such as 2025-08-26..2025-09-01).
    • List key confounders (viewer propensity to engage, author popularity, post age/freshness, content type via proxies like view duration, time-of-day, device, rank position) and control for them.
    • Propose a primary design — e.g., a fixed-effects regression with viewer×day fixed effects (and optionally author fixed effects) — and a secondary design ( propensity-score matching / inverse-propensity weighting , or a doubly-robust AIPW estimator). State the unit of analysis, covariates, and outcome window (e.g., reaction within 24h of first view).
    • Specify standard-error treatment (cluster by viewer and/or post), multiple-comparison control (one pre-specified primary metric; FDR on secondaries), and diagnostics (overlap/common support, post-adjustment covariate balance, placebo using hide/report outcomes, robustness across post-age buckets).
  3. Experiment design — launching/expanding Unconnected content. Propose a randomized experiment to measure success.
    • Randomization unit (user-level, sticky); treatment variants — either reserve a share of feed slots for Unconnected content (e.g., 0% / 10% / 30%) or scale the relationship ranking weight (e.g., θ = 1.0 control vs. θ = 0.8 to upweight Unconnected). Define primary outcomes (net social reactions per DAU, reactions per session, time-to-first-friend-interaction), guardrails (retention/D+1 return, session length, hide/report rates, creator follows, friend-ecosystem health, long-term re-engagement), and minimal acceptable lifts.
    • Provide a power/duration check (baseline rate, MDE, α, power, clustering design effect) and a variance-reduction plan (CUPED with pre-exposure per-user baseline; diff-in-diff for long-run panels).
    • Address novelty effects, personalization/learning ramp, supply constraints (log intended vs. achieved Unconnected share; ITT + exposure-on-treated), and network/peer interference (graph-cluster randomization, supply/author holdouts, or interleaved time-split ramps).
    • Include segment / heterogeneity analyses (new vs. power users; friend-graph density; region; consumption-style deciles) with multiple-testing control, and a clear ship/iterate decision framework.
  4. Value of Unconnected content beyond near-term engagement. Even if immediate engagement is lower, define and measure the incremental value of Unconnected exposure using only the given tables (and call out what extra logs you'd request):
    • Discovery value : new viewer–author pair rate, repeat-return-to-author rate, creator breadth and topical entropy per viewer–day.
    • Long-term value : next-day (D+1) and 7-day retention, session depth.
    • Amplification : reshare-driven downstream reach (incremental views following a reshare). Present a trade-off view (per-1,000-impression KPIs) so a decision-maker can weigh near-term engagement against discovery and long-term value.

Deliverables: (a) a metric spec with formulas; (b) an observational analysis plan with controls and diagnostics; (c) an experiment-design doc with randomization unit, power inputs, interference mitigations, and stopping rules; (d) KPIs quantifying the incremental value of Unconnected content even when near-term engagement is lower.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Analytics & Experimentation•More Meta•More Data Scientist•Meta Data Scientist•Meta Analytics & Experimentation•Data Scientist Analytics & Experimentation
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.