PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Analytics & Experimentation/Apple

Evaluate a model and choose metrics

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a data scientist's competency in cost-sensitive model evaluation, handling extreme class imbalance, calibration and threshold derivation, experiment design, and post-launch monitoring and fairness within the Analytics & Experimentation domain.

  • hard
  • Apple
  • Analytics & Experimentation
  • Data Scientist

Evaluate a model and choose metrics

Company: Apple

Role: Data Scientist

Category: Analytics & Experimentation

Difficulty: hard

Interview Round: Onsite

You own a fraud‑screening model for e‑commerce orders. Base rate: 0.7% fraud. Actions: flag→manual review cost=$3; flagging a legitimate order incurs $1 friction; missing a fraud costs $120; correctly passing a legitimate order costs $0. On a 100,000‑order validation set (700 positives), two candidate models at threshold 0.5 yield: Model A: TP=490, FP=4,900, FN=210, TN=94,400. Model B: TP=560, FP=8,400, FN=140, TN=90,900. Tasks: (a) Compute precision, recall, F1, ROC‑AUC proxy via TPR/FPR points, and expected cost per order for A and B at 0.5. Which model is better under the stated costs? (b) Derive the cost‑optimal threshold generally in terms of calibrated P(y=1|x) and costs; apply it here assuming perfect calibration and the base rate. (c) Discuss PR‑AUC vs ROC‑AUC under extreme imbalance, calibration checks (Brier, ECE), and decision curve analysis/net benefit. (d) Propose an offline evaluation plan robust to prevalence shift and a safe online A/B with guardrails (manual review SLAs, false accusation rate, holdout for drift), and how you’d monitor post‑launch for concept drift and fairness across user segments.

Quick Answer: This question evaluates a data scientist's competency in cost-sensitive model evaluation, handling extreme class imbalance, calibration and threshold derivation, experiment design, and post-launch monitoring and fairness within the Analytics & Experimentation domain.

Related Interview Questions

  • Choose Optimal Network Retry Threshold - Apple (hard)
  • Diagnose post-release conversion regression rigorously - Apple (Medium)
  • Investigate cross-country engagement and ads experiments - Apple (easy)
  • Design an A/B Test for Homepage Layout Impact - Apple (medium)
  • Examine Data to Boost Instagram Purchases Effectively - Apple (medium)
Apple logo
Apple
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Analytics & Experimentation
3
0

Fraud-screening model evaluation under class imbalance and asymmetric costs

Context

You operate a binary classifier that flags e‑commerce orders for manual review. The base fraud rate is 0.7% (700 frauds out of 100,000 orders). Actions and outcome costs:

  • If flagged: manual review cost = $3 for any flagged order.
  • Additional friction cost for mistakenly flagging a legitimate order (FP) = $1.
  • Missing a fraud (FN) costs $120.
  • Correctly passing a legitimate order (TN) costs $0.

Two candidate models at threshold 0.5 produce the following on a 100,000‑order validation set (700 positives):

  • Model A: TP=490, FP=4,900, FN=210, TN=94,400.
  • Model B: TP=560, FP=8,400, FN=140, TN=90,900.

Tasks

(a) For each model at threshold 0.5, compute:

  • Precision, Recall, F1
  • TPR and FPR (and a single‑point ROC‑AUC proxy)
  • Expected cost per order under the stated costs Decide which model is better under these costs.

(b) Derive the general cost‑optimal classification threshold in terms of calibrated P(y=1|x) and the four outcome costs. Then apply it to this problem (assume perfect calibration) and report the numeric threshold.

(c) Discuss:

  • PR‑AUC vs ROC‑AUC under extreme class imbalance
  • Calibration checks (e.g., Brier score, Expected Calibration Error)
  • Decision curve analysis / net benefit and how it aligns with the cost structure

(d) Propose:

  • An offline evaluation plan robust to prevalence shifts
  • A safe online A/B plan with guardrails (manual review SLAs, false accusation rate, holdout for drift)
  • A post‑launch monitoring plan for concept drift and fairness across user segments

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Analytics & Experimentation•More Apple•More Data Scientist•Apple Data Scientist•Apple Analytics & Experimentation•Data Scientist Analytics & Experimentation
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.