Validate friends vs unconnected; design rollout experiment
Company: Meta
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: hard
Interview Round: Technical Screen
You hypothesize that friend-authored content is more "social" (drives more likes/comments/reshares) than unconnected content. Using only info_stream_views and post_reactions, do the following:
1) Define metrics precisely (formulas welcome): e.g., per-user-per-day social reactions rate, reactions per post impression by relationship, reactioner-unique rate, follow/reshare rates, hide/report rates as quality guardrails. Use ds and relationship to attribute each reaction to the reacting viewer's relationship on that day.
2) Observational validation: Outline an analysis to compare Friend vs Unconnected engagement while mitigating confounding (e.g., control for post age, viewer activity, author popularity, content type if available via proxies like watch duration, time-of-day). Propose a design such as fixed-effects regression or propensity/matching at the viewer–post level, and specify the unit of analysis, covariates, and the 7-day window (2025-08-26..2025-09-01). Describe how you'd handle multiple views per post/viewer/day (e.g., aggregate to MAX duration) and reactions lacking a matched view row.
3) Launching unconnected content for the first time: Propose an experiment to measure success. Specify randomization unit (user-level), treatment variants (e.g., 0%, 10%, 30% unconnected slots in Info stream), primary success metrics (net social reactions per DAU, reactions per session, time to first friend interaction), guardrails (retention, session length, hide/report rates, creator follows, long-term re-engagement), and minimal acceptable lifts. Address novelty effects, learning/personalization ramp, supply constraints, and network/peer interference. Include a power/duration check, segment analyses (new vs power users; high friend-graph density), and a diff-in-diff or CUPED adjustment plan for pre-exposure baselines.
Quick Answer: The question evaluates a data scientist's competency in defining production-ready engagement metrics from event logs, attributing reactions by relationship, conducting observational causal analyses to mitigate confounding, and designing randomized rollout experiments with power checks and guardrails.