How do I approach Statistics & Math interview questions?

Statistics & Math questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master statistics & math interviews.

What difficulty level is this interview question?

This is a medium difficulty Statistics & Math question, commonly asked during Onsite rounds at Gemini.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Gemini during technical interviews.

Compute power and cost-aware thresholds | Gemini Interview Question

Quick Overview

This question evaluates a data scientist's competency in statistical estimation, experimental design, cost-sensitive loss modeling, and sequential testing as applied to fraud-detection rules, encompassing prevalence estimation, expected-value cost calculations, sample-size/power derivation, confidence-interval construction, and alpha-controlled monitoring. It is commonly asked in Statistics & Math interviews because it probes practical application of probabilistic modeling and A/B testing under operational constraints, emphasizing practical application-level understanding rather than purely conceptual theory.

You are evaluating a new ACH velocity+shared-device block rule. Assumptions

Volume: 1,000,000 ACH credits/month.
Baseline fraud prevalence: 0.15% of credits return as fraud within 5 business days.
Mean loss per fraudulent credit: $900; SD ≈$ 600.
Proposed rule: Recall 70% of fraudulent credits; FPR 0.05% on legitimate credits.
Cost per false positive (ops + churn): $25.

Tasks A) Expected value: Compute monthly gross fraud loss without the rule; with the rule; then net savings after false-positive costs. Show all formulas and units. B) Sample size: Using “loss per 1,000 credits” as the primary metric, design a two-arm online A/B test to detect a 15% relative reduction at α=0.05 (two-sided) and 80% power. State distributional assumptions (e.g., approximate as a Poisson rate or two-part model), derive the per-arm sample size, and justify your choice. C) Interval estimation: Control arm observes 240,000 credits with 0.12% fraud prevalence. Compute a 95% Wilson interval for the prevalence and interpret it for decision-making. D) Sequential monitoring: If results are reviewed daily for 14 days, propose a valid sequential testing plan (e.g., alpha-spending or group-sequential boundaries) and stopping rules that control Type I error. Explain how early stopping interacts with the chosen metric.

Quick Overview

You are evaluating a new ACH velocity+shared-device block rule. Assumptions

Volume: 1,000,000 ACH credits/month.
Baseline fraud prevalence: 0.15% of credits return as fraud within 5 business days.
Mean loss per fraudulent credit: $900; SD ≈$ 600.
Proposed rule: Recall 70% of fraudulent credits; FPR 0.05% on legitimate credits.
Cost per false positive (ops + churn): $25.

Compute power and cost-aware thresholds

Quick Overview

Compute power and cost-aware thresholds

Write your answer

Compute power and cost-aware thresholds

Quick Overview

Compute power and cost-aware thresholds

Write your answer