PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Behavioral & Leadership/Uber

Describe past impact and conflict handling

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's competency in end-to-end AI function-calling implementation, covering technical decision-making, API and data design, tooling and observability, measurable impact, and leadership in conflict resolution, and is categorized under Behavioral & Leadership for a Machine Learning Engineer role.

  • medium
  • Uber
  • Behavioral & Leadership
  • Machine Learning Engineer

Describe past impact and conflict handling

Company: Uber

Role: Machine Learning Engineer

Category: Behavioral & Leadership

Difficulty: medium

Interview Round: Onsite

Walk through a past project where you implemented AI function calling end-to-end. Explain the problem context, your role, key technical decisions (APIs, data modeling, tooling), the main challenges you encountered, and the measurable impact. Then describe a time you faced conflicts or blockers while driving the project (e.g., cross-team priorities or design disagreements). How did you diagnose root causes, align stakeholders, make trade-offs, and move the work forward? What would you do differently in hindsight?

Quick Answer: This question evaluates a candidate's competency in end-to-end AI function-calling implementation, covering technical decision-making, API and data design, tooling and observability, measurable impact, and leadership in conflict resolution, and is categorized under Behavioral & Leadership for a Machine Learning Engineer role.

Solution

# How to structure your answer (use this flow) - Situation and goal: Who is the user, what problem, and why function calling? - Your role: Scope, team, what you owned vs. influenced. - Architecture and key decisions: Model, function schemas, orchestration, APIs, safety. - Data and logging: Schemas, events, eval datasets, feedback loops. - Challenges → fixes: Top 2–3 issues and how you solved them. - Impact: Quantified product and system metrics; how measured (A/B or before/after). - Conflict story: Root cause, alignment, trade-offs, decision, outcome. - Hindsight: 2–3 concrete improvements. # Example answer (adapt details to your experience) 1) Situation and goal - Problem: Support agents handled high-volume rider/driver inquiries by switching across multiple tools (trip lookup, policy docs, refunds, ticketing). Average handle time (AHT) was high and decisions inconsistent. - Goal: Build an LLM-based copilot that uses function calling to retrieve trip data, apply policies, simulate refunds, and draft actions. Requirements: low latency (<2.5s p95), high tool-call reliability (>98%), zero PII leakage, measurable AHT reduction. - Why function calling: We needed reliable structured outputs and tool integration, not just free-form text. Function calling let the model choose tools and return JSON arguments under schema constraints. 2) My role and team - Role: Tech lead for the ML workstream. Partnered with 2 backend engineers, 1 data scientist, 1 PM, 1 designer, and infosec. - Ownership: End-to-end LLM orchestration and API design, function schemas, evaluation harness, guardrails, offline→online validation, launch criteria, and on-call for the first month post-launch. 3) Architecture and key technical decisions - Model and provider: Started with a general-purpose LLM that supports function calling for tool selection and JSON-structured arguments. For classification/gating we used a smaller, faster model to reduce cost/latency. - Orchestration pattern: - Step 1: Intent and tool gating using a small model (classify the request; choose allowed functions). - Step 2: LLM with function calling constrained to a whitelist of tools for the current intent. - Step 3: Execute tool(s) with timeouts and idempotency; return results to the LLM for synthesis into a proposed action + rationale. - Step 4: Policy validator (deterministic rules) checks the proposed action; if out-of-policy, request revision or fallback to manual. - Function/API design: - Tools: get_trip_details, get_user_flags, get_policy_snippet, simulate_refund, create_ticket. - Each function had a strict JSON schema: required fields, enums, min/max, and formats (e.g., trip_id as string UUID, refund_reason as enum). - We passed only non-sensitive identifiers (tokenized IDs) and fetched PII on the server side when absolutely needed. - Data modeling and logging: - Log every function-call attempt: request_id, tool_name, arguments_valid (bool), round_trips, latency, success/failure reason, and cost tokens. - Conversation transcript stored with PII redacted and structured annotations (intent, chosen tools, final action, agent override). - Golden dataset: 250 real, anonymized cases with ground-truth actions and policies to run offline regressions. - Tooling/stack: - Backend: FastAPI microservice for tools + orchestrator, Redis for caching, feature flags for gradual rollout. - Tracing: OpenTelemetry for request spans (model→tool→validator). - Evaluation: Custom eval harness that computes tool precision/recall, JSON conformance rate, policy adherence, and estimated AHT from timestamps. - CI/CD: Unit tests for schemas; contract tests for tool APIs; offline eval gate must pass before deploy. 4) Key challenges and how we solved them - Challenge A: JSON brittleness and hallucinated fields - Symptoms: 4–6% of calls had invalid arguments or extra fields; retries increased latency. - Fixes: Tightened schemas with enums/ranges, added a local JSON validator that auto-corrected trivial issues (e.g., type coercion), and added a two-turn pattern (first ask the model to plan tools, then call). Reduced invalid-call rate to 0.7%. - Challenge B: Latency spikes (p95 > 4s) - Diagnosis: Sequential retrieval and model calls; slow policy retrieval. - Fixes: Parallelized trip and policy fetches; response caching for policy snippets; moved intent classification to a smaller model; added circuit breakers and timeouts (800ms/tool). Achieved p95 2.2s, p99 3.1s. - Challenge C: Policy adherence and safety - Risk: The LLM sometimes proposed goodwill refunds beyond thresholds. - Fixes: Externalized policy rules into a deterministic validator; the LLM proposes, rules enforce. Added counterfactual prompts to force justification with policy IDs. Policy violations dropped from 5.4% to 0.6%. - Challenge D: Privacy and logging - Action: Redacted PII at source, tokenized user IDs, separated secrets from prompts, and implemented prompt scanning to prevent PII echo. Security approved under our privacy model. 5) Measurable impact (A/B experiment, 4 weeks) - AHT: −18% (from 6:10 to 5:03). - First contact resolution: +9.2 percentage points. - Escalations: −12%. - System reliability: 98.7% successful tool-call rate; 99.3% JSON schema conformance. - Cost and latency: p95 2.2s; blended cost −27% via small-model gating and caching. - Example value calculation: 50k monthly cases × 1.1 minutes saved = 55k minutes saved/month ≈ 916 hours. At $30/hour loaded cost, ≈ $27.5k/month. 6) Conflict/blocker story - Situation: Two blockers near pilot launch. Security paused production citing PII leakage risk in logs. The Support Tools team resisted adding LLM orchestration into their critical path due to reliability concerns. - Root-cause diagnosis: - Reviewed logs: PII occasionally surfaced when agents pasted raw info; prompts sometimes echoed verbose context. - For reliability: Our design lacked clear SLOs and fallbacks for tool timeouts. - Alignment tactics: - Wrote an RFC that included threat model, data flows, redaction strategy, and SLOs (99.5% tool availability, p95 < 2.5s); held a joint review with Security and Support Tools leads. - Proposed a phased rollout: internal-only pilot, then limited agent cohort, with a kill switch and on-call rotation. - Trade-offs and decisions: - We narrowed scope: read-only tools in phase 1, no auto-refunds without validator approval. Moved risky features to phase 2. - Committed to observability (dashboards for PII incidents, latency, tool error budget) and added hard fallbacks (graceful degradation to manual templates if tools fail). - Outcome: - Security approved under new logging/redaction; Support Tools integrated behind a feature flag with shared on-call. We launched the pilot on time, then expanded after meeting SLOs for 2 consecutive weeks. 7) What I would do differently - Engage Security and platform teams during discovery, not implementation; bake the threat model and SLOs into the initial design doc. - Build the eval harness and golden dataset first; enforce a quality gate before any UI integration. - Start with a narrower tool set and a single composite function schema to reduce surface area; expand only with clear telemetry on failure modes. - Introduce a deterministic planner earlier (rules or small model) to reduce dependence on a single large model for tool selection. # Practical guardrails and metrics you can mention - JSON schemas: Use strict types, enums, ranges; validate and auto-correct safe issues; reject otherwise. - Access control: Allowlist tools per intent; never pass raw PII into prompts. - Latency control: Parallelize I/O; circuit breakers; cache static knowledge; timeouts per tool; fallbacks. - Evaluation: Tool-call precision/recall; JSON conformance; policy adherence; p95/p99 latency; cost per task; human override rate; online vs. offline correlation. - Rollout: Feature flags; kill switch; SLOs and error budgets; A/B testing. # Quick checklist for your delivery - 1–2 sentence problem statement; 1 sentence on why function calling. - Your ownership and cross-functional partners. - 3–5 specific technical decisions (model, schemas, orchestration, safety). - 2–3 challenges with concrete fixes and before/after numbers. - Impact with clear metrics and how measured. - Conflict story with root cause, alignment, trade-offs, and outcome. - Two actionable hindsight improvements. Use the structure above, swap in your domain, numbers, and tools to keep it authentic and concise.

Related Interview Questions

  • Describe a Trade-off Design Change - Uber
  • Describe ownership and failure - Uber (medium)
  • Answer Common Behavioral Questions - Uber (medium)
  • How do you manage performance and disagreements? - Uber (medium)
  • Describe an ML system you built - Uber (medium)
Uber logo
Uber
Sep 6, 2025, 12:00 AM
Machine Learning Engineer
Onsite
Behavioral & Leadership
3
0

Behavioral and Leadership: AI Function Calling End-to-End + Conflict Resolution

Context

You are interviewing for a Machine Learning Engineer role. The interviewer asks you to demonstrate end-to-end ownership of an AI function-calling project and to show how you lead through ambiguity and conflict.

Prompt

  1. Walk through a past project where you implemented AI function calling end-to-end. Cover:
  • Problem context and why function calling was the right approach
  • Your role and ownership boundaries
  • Key technical decisions (LLM/provider, API design, orchestration, data modeling, schemas, logging)
  • Tooling/stack (frameworks, infra, evals, monitoring, CI/CD)
  • Main challenges and how you addressed them (e.g., JSON brittleness, latency, hallucinations, privacy)
  • Measurable impact with concrete metrics
  1. Describe a time you faced conflicts or blockers while driving this project (e.g., cross-team priorities, design disagreements):
  • Root-cause diagnosis
  • How you aligned stakeholders and made trade-offs
  • Actions you took to move forward and the outcome
  • What you would do differently in hindsight

Aim for a crisp, structured narrative with specific metrics and decisions (8–10 minutes).

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Behavioral & Leadership•More Uber•More Machine Learning Engineer•Uber Machine Learning Engineer•Uber Behavioral & Leadership•Machine Learning Engineer Behavioral & Leadership
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.