Assume WhatsApp currently supports only 1:1 audio and video calling and is considering launching group calling. You are the data scientist evaluating whether this is a good idea.
Assume you can access product logs such as:
-
users(user_id, country, signup_date)
-
calls(call_id, initiated_at, caller_id, recipient_id, call_type, duration_seconds, status)
-
sessions(user_id, session_start_ts, session_duration_seconds)
-
quality_events(call_id, connect_success, dropped, latency_ms, crash_flag)
How would you:
-
Clarify the target users, product goal, and key risks of launching group calls on WhatsApp?
-
Define a north-star metric, primary experiment metrics, guardrail metrics, and diagnostic metrics that can be computed from logs?
-
Explain why '% of users who made a group call' is not a strong primary A/B test metric when only treatment users have access to the feature?
-
Design an A/B test for the launch, including the randomization unit in the presence of network effects, the analysis approach, and the drawbacks of cluster randomization?
-
Account for cannibalization of 1:1 calls, low adoption, and heterogeneous effects across markets, device quality, or network quality before making a launch decision?