This question evaluates a data scientist's proficiency in product analytics, causal inference, metric engineering, experimentation design, and post‑launch diagnostics within the Analytics & Experimentation domain.

Assume today is 2025-09-01. You have only one table, calls_daily_agg(date, user_id, country, device_tier, one_to_one_calls_started, one_to_one_call_duration_sec, dropped_call_rate, group_intent_signals, concurrent_call_overlaps, inbound_group_invites_proxy, outbound_group_invites_proxy, p50_call_quality_score). Using only this table: (a) Propose concrete proxy metrics and a quantitative decision rule to infer unmet demand for a new Group Call feature before building it. Be specific about thresholds (e.g., concurrent_call_overlaps per 1k users) and how you’d segment users to avoid Simpson’s paradox. (b) List the highest-value additional resources you would add if allowed (max 5), and what unique bias each would mitigate. (c) Design an A/B test to evaluate Group Calls post-launch: choose the unit of randomization (user vs chat-group vs geo), justify interference handling, define one primary success metric and 3+ guardrails, specify ramp plan, novelty effects mitigation, cluster-adjusted power assumptions, and stopping rules. (d) After 9 months of GA (Dec 1, 2024–Sep 1, 2025), define how you’d judge long‑term success vs regression to the mean; if overall impact on the company’s North Star is null but certain cohorts benefit or suffer, give a go/holdback/sunset framework with quantitative cutoffs. (e) Post‑launch, we observe metric drops in certain regions/devices: outline a diagnostics plan (measurement, seasonality, cannibalization, QoS constraints) and the sequence of holdouts or rollbacks you would run.