Present an end-to-end project and defend decisions
Company: Snowflake
Role: Data Scientist
Category: Behavioral & Leadership
Difficulty: hard
Interview Round: Onsite
In 10 minutes (max 5 slides), present an end-to-end project you led that shipped to users. Cover: problem context, stakeholder goals, data sources, modeling/analysis, key decisions, results, and trade-offs. Then answer:
1) Which single metric did you optimize, which guardrails did you set, and why? Describe a time when your chosen metric conflicted with another stakeholder’s metric and how you resolved it.
2) What went wrong? Give one concrete mistake (e.g., an incorrect metric trade-off or a flawed assumption) and what you changed afterward.
3) If leadership rejects the proposal due to concerns about metrics (e.g., retention up but revenue down), propose a revised plan for a follow-up experiment or rollout that addresses the concerns without resetting timelines.
4) How did you collaborate with DE/PM/design? Provide a specific example of negotiating scope or data model changes under time pressure.
Quick Answer: This question evaluates a data scientist's end-to-end project leadership, product and experiment design, metric selection, trade-off analysis, and cross-functional stakeholder negotiation.
Solution
# Example 5-Slide Talk Track: Adaptive Query Acceleration (AQA) for a B2B Data Platform
Context: Users reported slow analytics queries during peak hours. We built and shipped an “Adaptive Query Acceleration” feature that automatically right-sizes compute and applies low-risk optimizations for heavy queries to reduce tail latency.
## Slide 1 — Problem & Stakeholders
- Problem: P95 query latency spiked during peak hours, driving support tickets and churn risk for mid-market accounts.
- Why now: Seasonal traffic growth made SLO breaches more frequent; competitors marketed “instant analytics.”
- Stakeholders & goals:
- Users/CS: Faster queries, fewer timeouts, fewer tickets.
- Product/PM: Improve adoption/retention for analytics workloads.
- Finance/RevOps: Avoid material drop in consumption revenue.
- Infra/DE: Keep error rates stable; avoid capacity thrash.
- Success criteria (at launch):
- Primary: Reduce P95 latency by ≥15% without raising error rate.
- Guardrails: Error rate ≤ +0.05 pp; queue wait time not worse; credits per 1k queries not worse than −15% (to protect revenue).
## Slide 2 — Data Sources & Instrumentation
- Data sources:
- Query logs: query_id, start/end, bytes scanned, spills, retries, error code.
- Warehouse telemetry: size, concurrency, queue wait time, cache hit rate.
- Billing/usage: credits consumed per query and per account-day.
- Support tickets: topic, account, timestamp for incident correlation.
- Account metadata: segment, commitment tier, historical churn signals.
- Instrumentation added:
- Stable query-to-warehouse join keys; tagging optimization decisions (feature flags, action chosen, confidence).
- P50/P95 latency and queue time computed per account-day; pre/post baselines for CUPED variance reduction.
- Experiment design:
- Randomization unit: account×warehouse (to reduce interference).
- 50/50 split, 4-week run, holdouts for high-value accounts.
## Slide 3 — Modeling & Policy
- Goal: Pick an action a ∈ {resize warehouse, adjust concurrency, enable safe rewrites} that minimizes tail latency without harming guardrails.
- Predictive modeling:
- Features: time-of-day, historical log-latency, concurrency, query complexity (bytes scanned, joins), and spill signals.
- Model: Gradient-boosted trees to predict log-latency; quantile loss for tail (p95). Separate model for credits/query.
- Decision policy (cost-aware optimization):
- Objective: minimize L_p95(a) subject to C(a) ≤ B, where B = baseline credits × (1 − ε).
- Implemented as: J(a) = L_p95(a) + λ·max(0, C(a) − B) with λ tuned via offline replay; hard reject if predicted error rate ↑.
- Exploration:
- Safe exploration with small perturbations (±1 step in resize) and a kill switch if guardrails breach for an account-day.
## Slide 4 — Key Decisions & Results
- Key product decisions:
- Rollout policy at account×warehouse to avoid noisy cross-traffic effects.
- Optimize for P95 (tail) rather than P50 to reflect user-perceived performance.
- “Auto-apply” only for safe actions; others shown as “recommendations” requiring user confirmation.
- Results (n≈600 account×warehouse units, ~15M queries over 4 weeks):
- P95 latency: 8.3s → 6.4s (−23%, 95% CI −20% to −26%).
- Queue wait time: −15%.
- Error rate: 0.19% → 0.21% (+0.02 pp, ns).
- Credits per 1k queries: −11% (Finance concern: near-term revenue impact).
- Downstream business: 90-day retention +1.5 pp (early leading indicator), support tickets −18% for treated accounts.
- Trade-offs:
- Better UX and stability vs reduced compute consumption; risk of underprovisioning at peak mitigated by kill switch and hard guardrails.
## Slide 5 — Postmortem, Plan B, and Collaboration
- What went wrong (concrete mistake): We initially optimized P50 latency, which improved medians but worsened P95 for some spiky workloads. We switched the objective to P95, used quantile models, and added a tail penalty in J(a). We also adopted CUPED with pre-experiment baselines to stabilize estimates.
- Metric conflict & resolution: Product prioritized P95 latency; Finance flagged −11% credits/query in pilot. Resolution: segmented rollout to churn-risk and high-ticket accounts (net positive NDR), capped savings with a per-account “compute floor” (no more than 10% credits reduction/day), and introduced a paid “Performance” entitlement for broader rollout.
- If leadership rejects due to revenue concerns (retention up, revenue down):
- Revised follow-up experiment (no reset to timelines):
1) Keep code paths; flip config to 3-cell test using existing flags:
- Control: no AQA.
- A: AQA unlimited (as built).
- B: AQA with compute floor (max 5–10% credits reduction) + target only churn-risk accounts.
2) Add pricing/packaging variant for a subset of B using existing entitlements (no new UI): “Performance” toggle requires higher-commit tier.
3) Evaluation: primary = P95; guardrails = error rate, queue time; business = credits/account-day, NDR proxy (expansion signals). Stop-loss: if credits drop >0.5% overall, pause expansion.
- Rationale: Addresses revenue risk via floors/targeting while preserving user-value proof; leverages existing feature flags to avoid timeline slips.
- Collaboration under time pressure:
- DE: We needed queue-wait-time by warehouse. A new pipeline would slip timelines, so we negotiated a minimal schema change (add warehouse_id and queue_wait_ms to the existing query log) and computed aggregates downstream. Also aligned on late-arriving data handling to avoid biased daily p95.
- PM: To hit the quarter, we scoped “auto-apply” to only warehouse resize; query rewrites shipped as recommendations. Clear success gates to re-enable auto for rewrites later.
- Design: Reduced the UI from a multi-chart dashboard to a simple “Before/After P95 and credits” card with one-line explainability (“We resized during peaks; predicted tail reduction 22%”).
---
How to adapt this pattern to your own project
- Pick a crisp, shipped feature. Make the north-star metric unambiguous and user-centered; enumerate 2–3 guardrails and thresholds.
- Show the experiment unit and why (interference, spillovers). Use a variance-reduction technique (e.g., CUPED) and a tail-focused metric if UX is spiky.
- Quantify at least one trade-off with numbers. Pre-commit a stop-loss.
- Prepare a Plan B that toggles via flags: segmentation, caps/floors, or packaging—so you can address leadership concerns without slipping.
- Have one concrete mistake and the exact process fix you implemented.