This question evaluates a data scientist's competency in causal inference, metric specification, observational analysis, and experiment design for content-ranking systems within the Analytics & Experimentation domain.

You are given two platform logs: info_stream_views (every feed view of a post) and post_reactions (likes, comments, reshares). Using these, you must:
Assume "Friend" means content from a viewer's graph connections and "Unconnected" means content from creators the viewer does not follow or is not directly connected to.
Define and justify metrics for "more social," specify unit-of-analysis, normalization, and guardrails.
Identify confounders and describe matching/stratification or inverse-propensity weighting to compare Friend vs Unconnected views while holding confounders constant. Specify standard error clustering and handling of repeated measures and multiple comparisons.
Design an A/B test that adjusts ranking weights on relationship signals (Friend vs Unconnected). Define randomization unit, primary outcomes, guardrails, sample size/MDE, duration, pre-registration, and power methods. Address interference, novelty, and saturation. Propose a network-aware variant (e.g., post-level or graph-cluster randomization).
Define and measure discovery value (new creators reached, diversity), long-term retention/session depth, and reshare-driven reach. Propose proxy measurements using only the two tables and call out additional logs needed.
a) A metric spec with formulas.
b) An observational analysis plan with controls and diagnostics.
c) An experiment design doc with randomization unit, power inputs, and stopping rules.
d) A KPI set quantifying incremental value of Unconnected content even if near-term engagement is lower.
Login required