Walk through a DS project end-to-end
Company: OneMain Financial
Role: Data Scientist
Category: Behavioral & Leadership
Difficulty: easy
Interview Round: Technical Screen
## Prompt
Describe one data science / analytics project you worked on, end-to-end.
## What to cover
Include concise but concrete details on:
- **Problem & goal:** What business/user problem were you solving? Who were the stakeholders?
- **Success metrics:** What was the primary metric? What diagnostic and guardrail metrics did you track?
- **Data:** What data sources did you use? Key tables/events, main features, and major data quality issues.
- **Method:** What analysis/modeling approach did you choose and why (alternatives considered)?
- **Evaluation:** How did you validate results (offline metrics, backtests, A/B test, quasi-experiment)?
- **Risks & assumptions:** Confounders, leakage risk, selection bias, missing data, seasonality.
- **Outcome & impact:** What changed as a result? Quantify impact if possible.
- **Iteration:** What would you improve with more time?
## Follow-ups (interviewer may ask)
- What was the hardest tradeoff you made?
- What did you do when results contradicted expectations?
- How did you communicate uncertainty and limitations?
Quick Answer: This question evaluates end-to-end data science competencies including problem framing, metric design, data sourcing and quality assessment, modeling and evaluation choices, risk identification, impact quantification, and stakeholder communication within the Behavioral & Leadership category for a Data Scientist position.
Solution
## What a strong answer looks like (structured playbook)
Use a tight STAR-like structure, but tailored to DS: **Context → Objective → Approach → Validation → Impact → Learnings**.
### 1) Context and objective (30–60s)
- State the product/business context and the decision to be made.
- Define the **unit** (user/session/order) and the **time window**.
- Clarify constraints (latency, interpretability, data availability, launch date).
**Example framing:**
> “We wanted to reduce churn for new users within 7 days of signup. The decision was which onboarding changes to ship and whether to target interventions to at-risk users.”
### 2) Metrics: primary, diagnostics, guardrails
- **Primary metric:** directly tied to the goal (e.g., D7 retention, conversion rate, revenue per user).
- **Diagnostic metrics:** explain *why* primary moved (activation rate, time-to-first-success, funnel step drop-offs).
- **Guardrails:** prevent harm (latency, unsubscribe rate, complaints, false positive rate for interventions, fairness segments).
Call out tradeoffs explicitly (e.g., increasing notifications may lift retention but hurt unsubscribes).
### 3) Data and quality checks
Explain data sources and what you validated.
- Coverage: “Do all platforms log this event?”
- Duplicates/outliers: bot traffic, retries, late-arriving events.
- Label definition: “What exactly counts as churn?”
- Join keys/time alignment: user_id consistency; timezone; lookback windows.
### 4) Approach and why it was chosen
Pick one main track and defend it:
- **Experimentation track:** A/B test design, randomization unit, exposure definition, power/MDE.
- **Causal inference track:** diff-in-diff, matching, IV, regression discontinuity (when RCT not possible).
- **Modeling track:** baseline → feature set → model choice → calibration/thresholding.
Mention alternatives you considered and why rejected (e.g., “couldn’t A/B due to low traffic; used diff-in-diff with parallel trends checks”).
### 5) Validation and robustness
Show you understand failure modes.
- For experiments: SRM checks, novelty effects, multiple testing, CUPED, heterogeneous effects.
- For models: leakage checks (feature time), temporal split, calibration, segment performance.
- For analyses: sensitivity analyses, placebo tests, outlier robustness.
### 6) Results, impact, and decision
Quantify impact and uncertainty.
- Provide point estimate + CI when relevant.
- Tie back to decision: “We shipped X / didn’t ship Y” and why.
**Example:**
> “Treatment improved D7 retention by +1.2pp (95% CI: +0.3 to +2.1pp) with no increase in unsubscribes; we rolled out to 100% and built a monitoring dashboard.”
### 7) Learnings and iteration
End with what you’d do next:
- Better instrumentation
- Segment-specific strategy
- Online evaluation
- Model refresh / monitoring / drift
## Common pitfalls to avoid
- Vague claims (“improved a lot”) with no metric definition.
- Describing only modeling without business decision and stakeholder outcome.
- Ignoring confounding/leakage/selection bias.
- No mention of guardrails or unintended consequences.