PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/Statistics & Math/Gemini

Compute power and cost-aware thresholds

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a data scientist's competency in statistical estimation, experimental design, cost-sensitive loss modeling, and sequential testing as applied to fraud-detection rules, encompassing prevalence estimation, expected-value cost calculations, sample-size/power derivation, confidence-interval construction, and alpha-controlled monitoring. It is commonly asked in Statistics & Math interviews because it probes practical application of probabilistic modeling and A/B testing under operational constraints, emphasizing practical application-level understanding rather than purely conceptual theory.

  • Medium
  • Gemini
  • Statistics & Math
  • Data Scientist

Compute power and cost-aware thresholds

Company: Gemini

Role: Data Scientist

Category: Statistics & Math

Difficulty: Medium

Interview Round: Onsite

You are evaluating a new ACH velocity+shared-device block rule. Assumptions - Volume: 1,000,000 ACH credits/month. - Baseline fraud prevalence: 0.15% of credits return as fraud within 5 business days. - Mean loss per fraudulent credit: $900; SD ≈ $600. - Proposed rule: Recall 70% of fraudulent credits; FPR 0.05% on legitimate credits. - Cost per false positive (ops + churn): $25. Tasks A) Expected value: Compute monthly gross fraud loss without the rule; with the rule; then net savings after false-positive costs. Show all formulas and units. B) Sample size: Using “loss per 1,000 credits” as the primary metric, design a two-arm online A/B test to detect a 15% relative reduction at α=0.05 (two-sided) and 80% power. State distributional assumptions (e.g., approximate as a Poisson rate or two-part model), derive the per-arm sample size, and justify your choice. C) Interval estimation: Control arm observes 240,000 credits with 0.12% fraud prevalence. Compute a 95% Wilson interval for the prevalence and interpret it for decision-making. D) Sequential monitoring: If results are reviewed daily for 14 days, propose a valid sequential testing plan (e.g., alpha-spending or group-sequential boundaries) and stopping rules that control Type I error. Explain how early stopping interacts with the chosen metric.

Quick Answer: This question evaluates a data scientist's competency in statistical estimation, experimental design, cost-sensitive loss modeling, and sequential testing as applied to fraud-detection rules, encompassing prevalence estimation, expected-value cost calculations, sample-size/power derivation, confidence-interval construction, and alpha-controlled monitoring. It is commonly asked in Statistics & Math interviews because it probes practical application of probabilistic modeling and A/B testing under operational constraints, emphasizing practical application-level understanding rather than purely conceptual theory.

Gemini logo
Gemini
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Statistics & Math
2
0

You are evaluating a new ACH velocity+shared-device block rule. Assumptions

  • Volume: 1,000,000 ACH credits/month.
  • Baseline fraud prevalence: 0.15% of credits return as fraud within 5 business days.
  • Mean loss per fraudulent credit: 900;SD≈900; SD ≈ 900;SD≈ 600.
  • Proposed rule: Recall 70% of fraudulent credits; FPR 0.05% on legitimate credits.
  • Cost per false positive (ops + churn): $25.

Tasks A) Expected value: Compute monthly gross fraud loss without the rule; with the rule; then net savings after false-positive costs. Show all formulas and units. B) Sample size: Using “loss per 1,000 credits” as the primary metric, design a two-arm online A/B test to detect a 15% relative reduction at α=0.05 (two-sided) and 80% power. State distributional assumptions (e.g., approximate as a Poisson rate or two-part model), derive the per-arm sample size, and justify your choice. C) Interval estimation: Control arm observes 240,000 credits with 0.12% fraud prevalence. Compute a 95% Wilson interval for the prevalence and interpret it for decision-making. D) Sequential monitoring: If results are reviewed daily for 14 days, propose a valid sequential testing plan (e.g., alpha-spending or group-sequential boundaries) and stopping rules that control Type I error. Explain how early stopping interacts with the chosen metric.

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Statistics & Math•More Gemini•More Data Scientist•Gemini Data Scientist•Gemini Statistics & Math•Data Scientist Statistics & Math
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.