Describe one high-stakes project where priorities changed mid-stream and you had to influence without authority. In your answer: 1) Give specific context, stakeholders, and constraints; 2) Share the toughest constructive feedback you delivered (upward or lateral), the exact phrasing you used, and how you maintained the relationship; 3) Explain how you built trust quickly with a new team (concrete actions in the first two weeks); 4) Walk through a conflict you resolved, including the trade-offs you accepted; 5) Quantify the impact (business and engineering/science) and one metric that got worse and why; 6) In hindsight, what would you change to achieve a stronger outcome?
Quick Answer: This question evaluates leadership under ambiguity, influence without formal authority, stakeholder management, delivering tough feedback, conflict resolution, rapid trust-building, and the ability to quantify and reflect on project outcomes for a Data Scientist within the Behavioral & Leadership domain.
Solution
# Example, Structured Answer (with teaching notes)
## 1) Context, stakeholders, and constraints
- Situation: I was the data scientist supporting Notifications Relevance across three surfaces (social interactions, group posts, and event reminders) for a large consumer app. The OKR was to reduce notification fatigue (mute/unsubscribe/complaint rates) by 10% while holding sessions started from notifications within −0.5%.
- Mid-stream change: Two weeks in, a privacy/audit finding required stricter consent gating on social-context features within three weeks. Leadership shifted priorities to: 1) ship compliant suppression immediately, and 2) avoid a hit to session starts. Our original plan (a heavier ML re-ranker) was at risk due to feature gaps and inference capacity.
- Stakeholders: PM (Notifications), Eng Lead (server), ML Lead, Integrity/Privacy counsel, Infra capacity owner, and two partner PMs for Groups and Events. I had no direct authority over any of them.
- Constraints: Quarter-end experiment freeze in week 7; limited online inference capacity; feature logging gaps; high reputational risk if compliance slipped; decision guardrails: session starts ≥ −0.5%, complaint rate ↓, opt-out rate ↓.
Teaching note: Make the stakes, timeline, and guardrails explicit so trade-offs later are credible.
## 2) Toughest constructive feedback (exact phrasing) and relationship maintenance
- Situation: The PM proposed shipping a blended variant measuring success by open rate, with a two-week A/B that, given new consent gating, would be underpowered for our guardrail (session starts). I needed to push back upward.
- Exact phrasing I used in a 1:1 and then repeated in the group doc:
"I’m concerned we’re optimizing to the wrong metric and risking a false positive. With the new consent constraints, open rate becomes a weaker proxy for session starts. Our power calc shows only ~45% power to detect a −0.3% change in sessions over two weeks. I can’t support shipping based on this design. I recommend we switch the north star to net session starts with opt-outs as a guardrail, extend the runtime or expand markets to reach ≥80% power, and run a smaller parallel canary for compliance."
- How I maintained the relationship:
- Brought data: a one-pager with the power analysis, sensitivity analysis, and a pre-registered decision rule.
- Offered a plan, not just critique: presented two scope options with timelines and risks.
- Gave credit and showed flexibility: kept the PM’s qualitative insights in the feature selection and asked her to present the revised plan, with me as the technical backstop.
Teaching note: Quote concise, respectful language, then show how you paired critique with solutions and attribution.
## 3) How I built trust quickly (first two weeks actions)
- Ran 15×30-minute 1:1s (PMs, Eng, Integrity, Infra) to map incentives, constraints, and decision-makers.
- Wrote and socialized a 2-page “Metrics & Guardrails” doc: definitions for session starts, opt-outs, complaint rate, delivery latency, and a pre-registered decision rule.
- Shipped a quick win: fixed a logging inconsistency in notification open attribution that was inflating opens by ~0.6pp; published a shared dashboard with cohort breakdowns and alerts.
- Prototyped an offline-to-online backtest comparing simple rules vs. a lightweight re-ranker under consent gating; shared lift vs. latency trade-offs.
- Set weekly office hours and a decision log so disagreements were captured and resolved transparently.
Teaching note: First two-week actions should combine relationship-building, instrumentation hygiene, and a tangible analytical win.
## 4) Conflict resolved and accepted trade-offs
- Conflict: Infra wanted an immediate hard throttle (−20% sends) to guarantee compliance before the freeze. PM/ML preferred the new re-ranker, which required scarce online feature joins and would push p95 latency.
- My proposal (compromise):
1) Phase 1 (2 weeks): Rule-based suppression for sensitive categories to meet compliance and reduce fatigue; no online joins.
2) Phase 2 (canary → 50%): Lightweight re-ranker using only already-available features (no new joins), with an explicit p95 < 200 ms guardrail.
- Trade-offs accepted:
- We gave up ~1.2% estimated additional lift vs. the heavier model to stay within latency and capacity.
- We reduced scope to two surfaces initially, deferring Events to next quarter.
Teaching note: Name the sides, present the compromise, quantify what you gave up, and why it was rational under constraints.
## 5) Quantified impact (business and engineering/science), plus one metric that worsened and why
- Business outcomes (90-day post-rollout):
- Total notifications sent: −14% (target −10%).
- Opt-out rate: −11% relative.
- Complaint rate: −18% relative.
- Sessions started from notifications: −0.2% (within −0.5% guardrail).
- Reactivation (30-day dormant users): no significant change (−0.05%, p=0.42) after adding a small exception list.
- Engineering/science outcomes:
- Model AUC: +0.06 absolute vs. baseline rules (offline), translating to +0.9% sessions in treatment where applied.
- Experiment design: pre-registered analysis, CUPED applied for variance reduction (~12% variance reduction on sessions).
- Latency: p95 delivery latency increased by 45 ms (from 140 ms to 185 ms) due to online scoring; still below the 200 ms guardrail.
- Metric that got worse and why:
- p95 delivery latency worsened (+45 ms) because even the lightweight re-ranker introduced feature fetch and scoring time. We accepted this trade-off within a defined budget to meet compliance and fatigue goals.
Teaching note: Separate business from engineering/science impact. Pick a specific, defensible downside and explain causality and guardrails.
## 6) Hindsight: what I’d change to achieve a stronger outcome
- Invest earlier in capacity negotiation: escalate for temporary inference capacity two weeks sooner; that likely would have allowed the fuller-featured model and an extra ~1% sessions lift without breaching latency.
- Automate power checks: ship a template that computes MDE and required duration at experiment creation, blocking underpowered designs.
- Segment-specific guardrail: set a stricter reactivation guardrail ex-ante (e.g., ≤ −0.2% for 30-day dormant users), which would have reduced the iteration cycles needed to tune exceptions.
- Instrumentation parity test: run a formal offline→online consistency suite before the canary to cut the debugging time we spent on feature drift.
Teaching note: Hindsight should be actionable (process, tooling, or early escalation), not just “do more.”
---
How to adapt this pattern in your own answers:
- Use STAR but map explicitly to the six asks above.
- Quantify both the upside and the controlled downside with guardrails.
- Show influence tactics: crisp memos, pre-reads with options and trade-offs, decision logs, and attribution.
- Validate rigor: power analysis, pre-registration, variance reduction (e.g., CUPED), guardrail metrics, and latency/capacity budgets.