Design Siri-vs-GPT query routing
Company: Apple
Role: Data Scientist
Category: Machine Learning
Difficulty: medium
Interview Round: Technical Screen
You are a Data Scientist at Apple designing a feature that decides whether a user's natural-language query should be routed to **Siri** or to a **GPT-based assistant**.
Assume the following product context:
- **Siri** is strong at device actions, personal assistant tasks, and Apple ecosystem integrations, such as setting alarms, sending messages, controlling apps/settings, and using personal context.
- **GPT** is strong at open-ended generation, summarization, brainstorming, explanation, and complex question answering.
- Routing mistakes are costly:
- Sending a device-control request to GPT may hurt task completion, privacy expectations, and reliability.
- Sending an open-ended reasoning request to Siri may hurt answer quality and user satisfaction.
- The system must balance **task success, user satisfaction, latency, privacy, safety, and inference cost**.
Design the routing system end to end. In your answer, address:
1. The product objective and the main success metrics, including tradeoffs among quality, latency, privacy, and cost.
2. How you would define the routing labels or ground truth for training data.
3. What features and model architecture you would use (for example: rules, classifier, ranking model, confidence thresholds, reject/clarification option, or a hybrid system).
4. How you would handle ambiguous queries, multi-intent queries, follow-up turns, and low-confidence cases.
5. How you would evaluate the system offline, including calibration and slice-based error analysis.
6. How you would run an online experiment to validate the router and avoid misleading conclusions from selection bias or other confounders.
You may assume queries arrive in English initially, but discuss how your design would generalize to multiple locales and privacy-sensitive contexts.
Quick Answer: This question evaluates a data scientist's competency in designing an end-to-end query-routing system between an on-device personal assistant and a large language model assistant, encompassing objectives, success metrics, ground-truth labeling, feature and model choices, ambiguity and multi-turn handling, and evaluation strategies.