Describe a specific project where you faced a strong disagreement with a tech lead on model selection under a tight deadline. How did you surface the conflict, structure the decision (trade‑offs, risks, data), and align priorities across stakeholders? Explain what you did independently (experiments, analyses) to validate your position. Suppose an external client was harsh, pushing to ship immediately and dismissing your concerns—how did you set boundaries, communicate status/risks, and keep trust? If the client escalates while your manager is unavailable, what concrete steps would you take in the next 24 hours, and how would you document decisions to prevent re‑litigation?
Quick Answer: This English summary evaluates conflict resolution, stakeholder management, technical judgment in model selection, risk assessment, escalation, and documentation skills within a Data Scientist role, testing Behavioral & Leadership competencies in a cross-functional, client-facing context.
Solution
Below is a structured, teaching-oriented example you can adapt. It shows how to turn a disagreement into a principled decision, protect delivery timelines, and keep stakeholders aligned.
---
## 1) Situation and Disagreement (STAR: Situation/Task)
- Situation: We were building a real-time ranking model for personalized notifications with a p95 latency SLO of <100 ms and strict memory constraints (edge deployment). A pilot with an external client was due in two weeks.
- Disagreement: The tech lead preferred a Transformer-based model (slightly higher offline accuracy). I advocated for gradient-boosted trees (e.g., XGBoost) due to latency, size, and faster iteration under the deadline.
- Assumptions: Daily traffic ~50M predictions; on-device/battery sensitivity; regulatory need for explainability to review decisions.
## 2) Surfacing Conflict and Structuring the Decision (STAR: Action)
I made the conflict explicit and constructive:
- Scheduled a 30-minute "decision review" with the tech lead and PM; framed it as a choice under constraints, not a personal disagreement.
- Proposed a lightweight DACI/RAPID structure:
- Driver: Me (prepares analysis, options)
- Approver/Decider: Eng Manager + PM jointly
- Contributors: Tech lead, privacy reviewer, SRE
- Informed: Client POC
- Wrote a 2-page decision memo with:
1) Problem and constraints
2) Options with trade-offs and risks
3) Success criteria and gating metrics
4) Rollout plan and rollback criteria
Decision criteria (weighted):
- Predictive quality (0.4)
- Latency and footprint (0.3)
- Robustness/operability (0.2)
- Delivery risk by deadline (0.1)
Options:
- A) XGBoost (trees): AUC 0.86, p95 latency ~45 ms, model size ~20 MB, mature tooling
- B) Transformer (small): AUC 0.88, p95 latency ~180 ms, model size ~250 MB, new serving path
Risk register examples:
- B may breach latency SLO; higher on-call risk at launch
- A slightly lower AUC, but stable; easier to explain and calibrate
## 3) Independent Validation (STAR: Action)
Time-boxed experiments (48 hours) to reduce uncertainty:
- Data hygiene: Temporal split to avoid leakage; stratified sampling by segment.
- Baselines: Logistic regression; then XGBoost; then a distilled Transformer prototype.
- Calibration and error analysis: Reliability curves, SHAP for feature effects, per-segment metrics.
- Latency and size tests: p95/p99 from a staging environment; memory and CPU profiles.
- Robustness checks: Drift sensitivity (train on T-2 weeks, test on last week), stress tests for missing features.
Quick numeric trade-off illustration:
- Suppose per-day errors: FP cost = $0.05 (annoyance), FN cost = $0.50 (missed engagement).
- Daily predictions: 50M; prevalence 5%.
- A: Precision 0.40, Recall 0.55 → Expected cost ≈ 0.40*FPs*0.05 + 0.45*FNs*0.50
- B: Precision 0.42, Recall 0.58
- If B reduces FN by 3% absolute but increases latency p95 by 135 ms, and latency breach reduces send-through by ~2% under load, the net expected value can flip. I quantified this with a simple expected-cost model:
- Expected cost = FP_count * C_FP + FN_count * C_FN
- And added a penalty for SLO breach based on historical drop-offs.
This put both options on a comparable business-impact scale.
Result: AUC gains from B translated to modest business lift but with high SLO risk. A met SLO with headroom and comparable expected value.
## 4) Aligning Stakeholders and Choosing a Path (STAR: Result)
- Presented a 1-pager summary with a heatmap of criteria scores, expected-value comparison, and a staged rollout plan.
- Proposed: Ship A now with a shadow test of B for one week; revisit with data. Approver agreed based on delivery risk and SLO.
- Outcome: Launched A with a canary rollout (1%→10%→50%→100%), feature flag, and rollback plan. Achieved +1.6 pp CTR lift vs baseline, p95 latency 52 ms, zero incidents. Shadow B informed a later iteration.
## 5) Handling a Harsh Client Pushing to Ship Immediately
Principles: be concise, quantify risk, give options, set boundaries.
- Boundary setting: “We will not exceed p95 100 ms at launch; shipping the Transformer today likely violates this by ~80 ms. That risks throttling and user complaints.”
- Offer options:
1) Ship A today GA; B runs in shadow. Reassess in 7 days with real data.
2) Limited pilot of B to 1% behind a kill switch, with explicit risk acceptance.
3) Delay B by one week to complete perf hardening.
- Communicate status/risks:
- Red/Yellow/Green risk dashboard; written daily updates with metrics and next steps.
- Clear rollback: “If p95 >100 ms for 15 minutes or error rate >0.5%, we auto-disable.”
- Maintain trust: tie claims to data; show live dashboards; invite client to read-only monitoring; summarize agreements via email.
## 6) If Client Escalates and Manager Is Unavailable: 24-Hour Action Plan
0–2 hours: Triage and authority
- Confirm temporary decision authority with Eng Manager proxy or PM (document in Slack/email).
- Re-state decision criteria and SLOs in writing.
2–6 hours: Data and experiments
- Run a targeted perf test of B at expected QPS; capture p95/p99, memory, error rates.
- Complete an A vs B expected-value model with latest costs (client-reviewed).
6–10 hours: Risk mitigation path
- Implement a 1% canary for B in staging or low-traffic slice; enable kill switch and alerts.
- Prepare a rollback runbook and on-call coverage.
10–14 hours: Stakeholder alignment
- 30-minute checkpoint with client: present side-by-side metrics, options with consequences, and a clear recommendation.
- Capture their choice and any risk acceptance needed.
14–24 hours: Execute and document
- Execute the agreed plan (e.g., ship A GA; B shadowed). Monitor and publish a brief report.
- Send a decision summary email: context, options, criteria, decision, owners, timelines, and explicit risk acceptance if applicable.
## 7) Documentation to Prevent Re‑litigation
Create a lightweight Architecture/Decision Record (ADR):
- Context: goals, constraints, SLOs
- Options considered: A, B (+ variants)
- Criteria and weights; data used (links to notebooks/dashboards)
- Decision and rationale: why chosen, why not others
- Risk acceptance: who approved, under what conditions
- Rollout plan: gates, kill switch, rollback
- Follow-ups: what data will trigger re-evaluation
Operationalize it:
- Store ADR in the repo; link JIRA tickets and experiment artifacts.
- Meeting notes with attendee acknowledgments; e-sign or emoji-ack in Slack for traceability.
- Version decisions: ADR-012-v1 (pilot), ADR-012-v2 (GA) with timestamps.
## Common Pitfalls and Guardrails
- Pitfall: Debating accuracy without latency/operability—use end-to-end SLOs and business cost modeling.
- Pitfall: Analysis paralysis—time-box experiments; agree on a deadline.
- Guardrails: feature flags, canary releases, auto-rollback, p95/p99 alerts, shadow mode, calibration checks, fairness audits per segment.
## Takeaway
A principled process—clear criteria, time-boxed validation, staged rollout, and rigorous documentation—lets you resolve disagreements quickly, protect the launch, and maintain trust even under external pressure.