You are a Data Scientist at Apple designing a feature that decides whether a user's natural-language query should be routed to Siri or to a GPT-based assistant.
Assume the following product context:
-
Siri
is strong at device actions, personal assistant tasks, and Apple ecosystem integrations, such as setting alarms, sending messages, controlling apps/settings, and using personal context.
-
GPT
is strong at open-ended generation, summarization, brainstorming, explanation, and complex question answering.
-
Routing mistakes are costly:
-
Sending a device-control request to GPT may hurt task completion, privacy expectations, and reliability.
-
Sending an open-ended reasoning request to Siri may hurt answer quality and user satisfaction.
-
The system must balance
task success, user satisfaction, latency, privacy, safety, and inference cost
.
Design the routing system end to end. In your answer, address:
-
The product objective and the main success metrics, including tradeoffs among quality, latency, privacy, and cost.
-
How you would define the routing labels or ground truth for training data.
-
What features and model architecture you would use (for example: rules, classifier, ranking model, confidence thresholds, reject/clarification option, or a hybrid system).
-
How you would handle ambiguous queries, multi-intent queries, follow-up turns, and low-confidence cases.
-
How you would evaluate the system offline, including calibration and slice-based error analysis.
-
How you would run an online experiment to validate the router and avoid misleading conclusions from selection bias or other confounders.
You may assume queries arrive in English initially, but discuss how your design would generalize to multiple locales and privacy-sensitive contexts.