##### Question
You are interviewing for a **Data Scientist Intern** role at TransUnion. The technical screen is an **asynchronous (one-way) video interview**: you are shown several prompts and must record a **~2-minute response** for each, often with limited retries. There is no live interviewer, so structure and clarity carry the answer.
Prepare strong, structured answers for the following five prompts:
1. **Python/R problem solving:** Describe a problem you solved using **Python or R**. What was the problem, why did you choose the tool, what analysis or modeling did you do, and what was the outcome?
2. **SQL usage:** Describe a situation where you used **SQL**. What data were you working with, what queries or techniques did you use, what data-quality checks did you run, and what business question did the work answer?
3. **Ambiguous / low-context project:** Tell me about a project where you started with **very little context or unclear requirements**. How did you define the problem, gather missing information, choose success metrics, and make progress despite uncertainty?
4. **Explaining to a non-technical audience:** How would you explain one of your technical projects to a stakeholder with **no technical background**? Focus on the problem, the recommendation, and the impact rather than jargon.
5. **Keeping a team aligned:** In a team project, how do you **keep everyone aligned** (goal-setting, scope, ownership, communication cadence, decision-making) and how do you handle misalignment or changing priorities? Give a concrete example.
**Constraints / format**
- ~2 minutes per prompt (roughly 250-320 words spoken; aim for 110-130 seconds).
- Clear structure, concrete personal actions, and measurable impact.
- The interviewer is evaluating analytical thinking, communication (including to non-technical audiences), comfort with ambiguity, and collaboration.
Quick Answer: TransUnion's Data Scientist Intern technical screen is an asynchronous video interview with five ~2-minute prompts covering Python/R problem solving, SQL usage, working under ambiguity, explaining technical work to non-technical stakeholders, and keeping a team aligned. This guide gives a per-prompt structure, the evaluator rubric, strong sample outlines, and the common mistakes to avoid.
Solution
## What the screen is really testing
Across all five prompts, evaluators score a one-way video on a small, consistent rubric. Hit every item and you stand out:
- **Clear problem framing** and a stated success metric.
- **Technical credibility** — sound choices, with awareness of alternatives and tradeoffs (not a list of libraries).
- **Ownership** — what *you* personally did, not just what "we" did.
- **Communication** — you can tailor detail to technical vs non-technical audiences.
- **Collaboration** — coordination, documentation, and handling disagreement.
- **Impact** — even if small: time saved, accuracy improved, a better decision enabled.
A TransUnion-specific note: TransUnion is a consumer credit-reporting and risk-information company, so a **"credit score"-style analogy** for any risk/scoring model is genuinely on-brand and lands well with their interviewers.
### A structure that fits 2 minutes
Use **STAR** (Situation -> Task -> Action -> Result), or for technical prompts the slightly better **Problem -> Approach -> Tradeoffs -> Result -> Reflection**. Pacing:
- **15-20s** — situation + the specific goal/metric
- **60-75s** — what you personally did and *why* you chose it
- **20-30s** — result, quantified if at all possible
- **10-15s** — one lesson learned or what you'd do next
Open with a one-sentence headline ("I built X to solve Y, which produced Z"), then fill in the detail. Prepare **2-3 reusable projects**; one strong project can often cover Python/R, SQL, ambiguity, and communication if you retell it from different angles.
---
## 1) A problem you solved with Python or R
**What they're listening for:** can you frame a real problem, use DS tooling for analysis/modeling, and connect the work to impact?
**Cover:**
- The business or research problem and why it mattered.
- The data — source, scale, messiness (missing values, duplicates, class imbalance).
- Why Python or R was the right tool.
- Your pipeline: ingestion -> cleaning -> feature engineering -> analysis/model -> validation.
- How you evaluated success: train/test split or cross-validation, a baseline comparison, and a metric appropriate to the problem (AUC, precision/recall, RMSE, forecast error, or a business KPI).
- The result and the decision it enabled.
**Strong sample outline:**
"For a customer-churn problem, I used Python because the workflow needed cleaning, feature engineering, and modeling in one pipeline. I merged usage, billing, and support-ticket data with pandas, engineered recency/frequency/engagement features, and compared a logistic-regression baseline against gradient boosting. I focused on **recall at a fixed precision** because missing a likely churner cost more than over-contacting a few customers, and validated with cross-validation. The model lifted recall ~18% over the baseline and was used to prioritize retention outreach. Next I'd add calibration checks and monitor for drift."
**Pitfalls:** listing libraries with no decisions; no baseline or evaluation metric; not stating how the output would actually be used.
---
## 2) A situation where you used SQL
**What they're listening for:** can you retrieve and shape relational data independently, ensure correctness, and tie it to a business question?
**Cover:**
- The business question and the data sources, including the **grain** (e.g. user-day vs transaction-level — naming the grain is a strong signal).
- SQL techniques you actually used: joins across fact/dimension tables, aggregations and distinct counts, **window functions** (ROW_NUMBER for dedup, LAG for retention, running totals), CTEs for staged/readable logic, CASE WHEN, date filtering and cohorting.
- **Data-quality checks:** row counts, uniqueness/dedup, NULL handling, reconciliation against the source, consistent metric definitions.
- Performance awareness: filter early, indexed join keys, avoid fan-out (many-to-many) joins.
**Strong sample outline:**
"I analyzed conversion through a signup funnel. I joined event logs to the user table on user_id, built CTEs for each funnel step (visit -> signup -> activation), and used a window function to keep each user's first qualifying event so repeat activity wasn't double-counted. I checked for duplicate events and inconsistent timestamps before computing rates. The analysis surfaced a large drop at the verification step, which directed the team to a form redesign. It taught me that clean metric definitions matter as much as the query."
**Pitfalls:** "I used SQL to pull data" with no complexity; not stating the grain; getting LEFT JOIN semantics wrong or double-counting.
---
## 3) A project with little context / ambiguous requirements
**What they're listening for:** can you create structure where none exists and define success *before* building? This is one of the most important prompts for a DS role.
**A high-quality approach:**
1. **Clarify the decision** the analysis will inform ("what will change because of this?").
2. **Identify stakeholders and constraints** — timeline, data availability, privacy.
3. **Define success metrics** — a primary metric, diagnostics, and **guardrail metrics** to avoid local optimization; use a **proxy/leading indicator** if the true metric is delayed (e.g. activation rate as a proxy for retention).
4. **Make assumptions explicit** and validate them with a quick stakeholder check-in.
5. **Ship a thin slice / MVP** — fast EDA plus a simple baseline — rather than waiting for perfect information.
6. **Iterate** on findings and feedback; narrow scope as you learn.
**Call out the analytical risks** you watched for: selection bias (only power users in the data), confounding (seasonality, concurrent campaigns), and data quality (missing IDs, duplicated events).
**Strong sample outline:**
"The ask was just 'improve engagement,' with no agreed metric. I met stakeholders to learn whether they cared about short-term clicks or long-term retention. Since retention is slow to observe, I proposed **activation rate** as a leading indicator and documented the tradeoff. I built a first dashboard and EDA to find where drop-off happened, then narrowed scope to first-week users, which cut noise and made the work actionable. The lesson: ambiguity usually comes from missing *decision context*, not missing data."
**Pitfalls:** acting before stakeholder alignment; overbuilding a complex model before confirming the metric and data.
---
## 4) Explaining a technical project to a non-technical audience
**What they're listening for:** can you translate technical work into business value and influence decisions, not just run analysis?
**A reliable framework:**
1. **Why** — the business problem, in plain language.
2. **What** — the solution conceptually, one sentence, no jargon ("we built a score that estimates..."). A TransUnion-friendly analogy: "Like a credit score, but for how likely a customer is to cancel."
3. **So what** — the recommendation, the impact, and how it changes a decision.
4. **Caveat** — limitations and what you do if the model is wrong (monitoring/fallback).
Replace jargon: say "past-behavior signals" instead of "features," "checked how accurate it was" instead of "AUC." Pick **one simple metric**. Tailor to the audience — executives care about impact and risk; PMs about actions and tradeoffs; engineers about reliability.
**Bad:** "I trained a gradient-boosted ensemble with engineered features and tuned hyperparameters."
**Better:** "We built a system that flags which customers are most likely to leave, so the team can step in earlier and spend retention budget where it matters most. The biggest warning sign was low early engagement."
**Pitfalls:** over-indexing on algorithms; never explaining how the output is *used*.
---
## 5) Keeping a team aligned
**What they're listening for:** can you coordinate work, prevent surprises, and resolve disagreement — and stay on track when priorities change?
**Concrete practices to mention:**
- **Shared doc** of goals, non-goals, definitions, and assumptions.
- **Clear ownership** — RACI-style (who's Responsible/Accountable/Consulted/Informed) or simply one owner per deliverable.
- **Milestones + a communication cadence** — short weekly check-ins or standups.
- **A decision log** capturing why you chose X over Y, plus written metric definitions.
- **Surface risks and blockers early**; escalate rather than letting misalignment grow.
**Conflict / changing priorities (strong signal):** "When we disagreed on the metric definition, I ran a 30-minute working session, presented two options with pros/cons, and we documented the choice for consistency. When priorities shifted, I restated the tradeoff plainly — fast prototype vs more rigorous final — so the team chose deliberately instead of drifting."
**Strong sample outline:**
"I align the team on three things first: the goal/success metric, each person's owner area, and the timeline. I keep a shared tracker with owners and deadlines, run short weekly check-ins to catch blockers, and summarize decisions in writing so everyone sees the same plan. That cut duplicate work and kept us on schedule."
---
## What separates great answers from average ones
Great answers use a **specific example** (not generic claims), **quantify impact**, make your **personal contribution** explicit, show **judgment under uncertainty**, acknowledge a **tradeoff or limitation**, and adapt to the **audience**. Average answers stay vague, list tools without impact, describe "what the team did" without your role, or skip the result.
## Quick checklist before you record
- 1-2 reusable projects, each with: one-sentence problem, data sources, method + why, a quantified result, and one lesson learned.
- One non-technical explanation rehearsed.
- Practice timed to 110-130 seconds per prompt.
- For each take, self-check: did I state the objective in the first 15 seconds, quantify impact, name a tradeoff/limitation, and show collaboration?
- No internship experience? Use coursework, research, hackathons, or student-org projects — they count.
Explanation
This is a behavioral/communication screen (one-way video), so the 'solution' is a preparation playbook plus a rubric rather than a coded answer. The combined response gives a per-prompt structure, what evaluators score, strong sample outlines, and common pitfalls, with TransUnion's credit-risk context woven in where the analogy fits.