Evaluate New Model's Performance Against Existing System

Q: Evaluate New Model's Performance Against Existing System

This question evaluates a data scientist's competency in machine-learning model evaluation and experimentation, covering metrics (precision, recall, F1/Fβ, ROC-AUC, PR-AUC, calibration), dataset and label quality, class imbalance handling, thresholding/triage policies, and safety and ethical guardrails.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Scenario

You are evaluating a new machine-learning model that detects harmful content on a large consumer platform. Leadership needs evidence that the new model outperforms the existing model, and to understand trade-offs between catching more harmful posts and avoiding over-removal of benign posts.

Task

Design a comprehensive evaluation plan to compare the new model against:

the existing (production) model, and
optionally, a minimal/no-model baseline (only if safe via safeguards).

Address both:

Offline evaluation using labeled data and confusion-matrix-based metrics.
Online A/B testing and experiment design.

Make the precision–recall trade-offs explicit, and connect model metrics to business outcomes.

Requirements

Define key metrics: precision, recall, F1/Fβ, ROC-AUC, PR-AUC, calibration (Brier score, reliability), threshold-specific metrics.
Describe dataset design, label quality, and class imbalance handling.
Propose thresholding/triage policies (e.g., auto-remove vs. send-to-review).
Outline online experiment design: unit of randomization, triggers, guardrails, primary/secondary KPIs, safety and ethical constraints.
Include validation steps, power/duration considerations, and guardrails for false positives and user impact.

Evaluate New Model's Performance Against Existing System

Scenario

Task

Requirements

Solution

Comments (0)

Evaluate New Model's Performance Against Existing System

Overview

Scenario

Task

Requirements

Solution

Comments (0)