Stakeholder/BI case: You join the Chicago Fraud team as Decision Scientist; the HM emphasized BI tools and high-impact analysis. In your first 90 days, you must influence a policy change that reduces ATO while minimizing user friction. Tasks: A) Lay out a 30/60/90-day plan with concrete outputs (e.g., Week 2: ship a Looker dashboard with daily fraud loss by device/IP novelty; Week 6: present a CBA for a new ATO rule). B) Define 5 core dashboard tiles with precise metric formulas and denominators (e.g., ATO Loss per 1K tx, Legit Block Rate per 1K legit tx, Step-up Success Rate). C) Draft a 150-word executive update to the Venmo PM and Risk Ops lead explaining a proposed rule change and its expected net impact, including assumptions and guardrails. D) Given pushback that SQL-only is sufficient, articulate when you’d use SQL vs. Python for reproducible analyses and what governance you’d enforce (review, versioning, data contracts) to ensure trust in BI.
Quick Answer: This question evaluates competency in BI-driven fraud risk management, stakeholder influence, dashboard and metric design, executive communication, and data governance within a data science role.
Solution
A) 30/60/90-day plan and concrete outputs
Assumptions: ATO labels have 14-day median lag; feature flags exist for rules/step-ups; BI = Looker; modeling stack = SQL + Python; device/IP novelty = first-seen within 30 days.
Days 1–30 (Discover, instrument, baseline)
- Week 1: Align on goals and definitions
- Output: 1-pager with canonical metric definitions (ATO, legit, blocked, step-up, novelty), time-of-event conventions (attempt vs authorization), lookback windows, label-lag handling.
- Stakeholders: Risk Ops, PM, CS/Disputes.
- Week 2: Foundational dashboard MVP
- Output: Looker dashboard v1 with daily ATO loss, ATO rate, legit block rate; sliced by device/IP novelty and geography. Includes data freshness and label-lag banners.
- Week 3: Data QA and lineage
- Output: dbt or SQL tests (freshness, volume, nulls, referential integrity), lineage doc, and data contract draft for key tables/fields (schemas, SLAs, PII handling).
- Week 4: Hypothesis and rule ideation
- Output: Shortlist of rule candidates (e.g., risk-based step-up on novel device+velocity+amount), with backtestable SQL specs and risk score thresholds.
Days 31–60 (Backtest, CBA, experiment design)
- Week 5: Offline backtests
- Output: Rule backtest report with TPR/FPR by segment, expected coverage, and collision with existing controls.
- Week 6: Cost–benefit analysis (CBA) and design review
- Output: CBA deck and memo; propose experiment design (A/B or sequential test), sample size, guardrails, and rollback conditions. Present to PM + Risk Ops.
- Week 7: Experiment readiness
- Output: Feature flag config, monitoring tiles (near-real-time), on-call/rollback runbook.
- Week 8: Launch limited ramp
- Output: 5–10% traffic ramp; daily experiment digest to stakeholders.
Days 61–90 (Scale, codify, handoff)
- Week 9: Analyze interim results
- Output: Mid-experiment readout with power check and preliminary net impact; adjust thresholds if needed.
- Week 10: Ramp to 50% if guardrails pass
- Output: Updated CBA; PRD addendum for policy change; finalize operational playbooks (appeals, overrides).
- Week 11: Decision and rollout
- Output: Go/No-Go doc; change log; staged rollout plan (by region/user risk tiers).
- Week 12: Codify and handoff
- Output: Final dashboard v2, metric contracts in BI semantic layer, runbook, and postmortem/retrospective.
B) Five core dashboard tiles with precise formulas
Notation: T = all transaction attempts in window; L = legit transactions (not later confirmed ATO/fraud within 30 days); A = confirmed ATO transactions; $loss_i = reimbursed USD loss for ATO txn i. Novel_N = transactions where device_id_first_seen_days ≤ N OR ip_first_seen_days ≤ N (use N=30 unless stated). Use cohorting to handle label lag (report ATO on a 30-day completed cohort) and a provisional view for near-term monitoring.
1) ATO Loss per 1K Transactions
- Formula: 1000 × (Σ_{i∈A} $loss_i) / |T|
- Denominator: all transaction attempts in window.
- Example: 1,000,000 tx; 500 ATO; avg loss $200 → Loss/1K = 1000 × (500×200)/1,000,000 = 100.
2) Confirmed ATO Rate per 1K Transactions
- Formula: 1000 × |A| / |T|
- Denominator: all transaction attempts.
- Example: 500/1,000,000 × 1000 = 0.5 per 1K.
3) Legitimate Block Rate per 1K Legit Transactions
- "Blocked" = hard decline or abandoned due to added friction; exclude system/timeouts not tied to risk.
- Formula: 1000 × |Blocked ∩ L| / |L|
- Denominator: legit transactions (post-30d label). Provide a provisional view using L_provisional (no known risk flags within 7d) with a banner.
- Example: 2,000 blocked legit out of 995,000 legit → 1000 × 2000/995000 ≈ 2.01 per 1K.
4) Step-up Success Rate
- "Challenged" = users who received MFA/step-up; "Completed" = passed step-up and continued.
- Formula (overall): |Completed| / |Challenged|
- Breakouts recommended: by device novelty, OS, and legit-only completion rate = |Completed ∩ L| / |Challenged ∩ L|.
- Example: 10,000 challenged, 8,500 completed → 85%.
5) Novelty ATO Rate per 1K Novel Transactions
- Novelty definition (N=30 days): device_first_seen_days ≤ 30 OR ip_first_seen_days ≤ 30.
- Formula: 1000 × |A ∩ Novel_30| / |T ∩ Novel_30|
- Example: 100,000 novel tx; 300 ATO → 1000 × 300/100000 = 3 per 1K.
Implementation notes
- Time alignment: For tiles 1–3 and 5, prefer transaction-date cohorts with a 30-day labeling window; show both cohort-complete and near-real-time provisional panels.
- Currency: Convert losses to a single currency daily (FX at txn date).
- Dedup: Count at attempt level; exclude retries after hard declines to avoid double counting.
C) ~150-word executive update (proposed rule change)
Proposal: Introduce risk-based step-up for P2P sends when a novel device/IP (first seen ≤30 days) is combined with high velocity (≥3 sends in 1 hour) or amount ≥$200. Offline backtests show 42% coverage of historical ATOs with a 6.5% false-positive rate on legit traffic in this segment. Expected impact: reduce ATO loss by ~25% (≈$30k/week from a $120k/week baseline) while challenging ~1.2% of legit sends; with 85% step-up completion, projected incremental drop-off is ~0.18%, costing ≈$8k/week in foregone throughput; added manual review cost <$2k/week. Net benefit ≈+$20k/week.
Guardrails: holdout 10%; ramp 10%→50%; auto-rollback if (a) legit block rate +0.5 per 1K, (b) step-up completion <80%, or (c) CS ATO complaints +20% w/w. Assumptions: 14-day label lag; loss recovery rate unchanged; novelty defined as first-seen ≤30 days. We will monitor by geography/device and adjust thresholds to minimize friction on established users.
D) SQL vs. Python and governance for trusted BI
When to use SQL
- Canonical metrics and dashboards: aggregations, filters, window functions, joins on modeled tables.
- Reproducible transformations: implemented in dbt/ETL with tests and the BI semantic layer.
- Backtests expressible as set logic (rule coverage, TPR/FPR) and daily monitoring queries.
When to use Python
- Statistical inference and experimentation: power/sequential tests, CUPED, bootstrap CIs, uplift modeling.
- Modeling/simulation: risk score calibration, threshold optimization, Monte Carlo for loss distributions.
- Feature engineering beyond SQL ergonomics (NLP on device strings, graph features), API integrations, and notebook-driven EDA that graduates to packaged scripts.
Governance to ensure trust
- Version control: all SQL/Python in Git; PR reviews with code owners; linting; CI runs query tests.
- Semantic layer: single source of truth for metric definitions (Looker Explores/dbt metrics). No ad-hoc redefinitions.
- Data contracts: schemas, SLAs, PII policies, and deprecation rules for core tables/fields; breaking changes require approvals.
- Testing/observability: dbt tests (unique/not null/referential), freshness/volume monitors, anomaly alerts on key metrics.
- Reproducibility: analysis templates capture dataset snapshot, commit hash, parameters, and seeds; notebooks are parameterized and promoted to scripts/jobs.
- Experiment governance: pre-registered analysis plans, guardrail metrics, holdouts, and change logs; rollback runbooks and audit trails for rule changes.
CBA blueprint (use in Week 6 and decisioning)
- Net impact per 1K tx = Avoided ATO loss − Friction cost (legit drop-off × margin) − Ops cost (manual reviews) − Vendor/auth cost.
- Validate with sensitivity bands (±10–20%) on prevalence, completion, and loss severity; decide using worst-case acceptable net.