Decide between two vendors under constraints
Company: Google
Role: Data Scientist
Category: Machine Learning
Difficulty: Medium
Interview Round: Onsite
You have two third‑party search vendors, A and B, plus historical order‑level data: lead_time_days, unit_price, on_time_rate, defect_rate, min_order_qty, capacity, distance_km, historical_cancellation_rate, late_penalty_per_order, stockout_cost_per_day, and whether SLA was met. Design a decisioning model to choose A vs B for each incoming order to minimize expected total cost while maintaining SLA attainment ≥95% and a monthly budget cap.
- Define the objective function and write the expected‑cost decision rule explicitly (include price, expected late penalties, expected quality failure cost, and stockout costs).
- Propose features and a modeling approach (e.g., cost‑sensitive logistic regression predicting SLA miss probability with a cost‑based threshold; pairwise learning‑to‑rank; or a contextual bandit). Justify your choice under class imbalance and non‑stationarity.
- Address selection bias from historical routing (e.g., propensity modeling, inverse propensity weighting, counterfactual risk minimization). Specify how you would estimate propensities and stabilize weights.
- Describe offline evaluation (time‑based cross‑validation, constrained metrics for SLA ≥95%, cost curves) and an online rollout with safety constraints and vendor capacity limits.
- Handle cold start for a new vendor and per‑vendor capacity constraints (e.g., knapsack/assignment layer atop predictions). What diagnostics would you run if A appears cheaper but late penalties rise?
Quick Answer: This question evaluates a candidate's competency in decision modeling and cost‑sensitive machine learning, including causal inference for selection bias, propensity estimation, constrained optimization for capacity and SLA trade‑offs, and offline/online evaluation in the Machine Learning domain for a Data Scientist role.