PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Meta

Evaluate New Model's Performance Against Existing System

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a data scientist's competency in machine-learning model evaluation and experimentation, covering metrics (precision, recall, F1/Fβ, ROC-AUC, PR-AUC, calibration), dataset and label quality, class imbalance handling, thresholding/triage policies, and safety and ethical guardrails.

  • medium
  • Meta
  • Machine Learning
  • Data Scientist

Evaluate New Model's Performance Against Existing System

Company: Meta

Role: Data Scientist

Category: Machine Learning

Difficulty: medium

Interview Round: Technical Screen

##### Scenario A new machine-learning model flags harmful posts; leadership wants evidence it outperforms the old system. ##### Question How would you evaluate the performance of the new harmful-content detection model versus the existing model or no model? Describe both offline evaluation (confusion matrix metrics) and online A/B testing approaches, addressing precision-recall trade-offs. ##### Hints Mention metrics (precision, recall, F1, ROC), calibration, business KPIs, guardrails, and experiment design.

Quick Answer: This question evaluates a data scientist's competency in machine-learning model evaluation and experimentation, covering metrics (precision, recall, F1/Fβ, ROC-AUC, PR-AUC, calibration), dataset and label quality, class imbalance handling, thresholding/triage policies, and safety and ethical guardrails.

Related Interview Questions

  • Implement 1NN Embeddings and Forward Pass - Meta (hard)
  • Design and evaluate an ads ranking algorithm - Meta (easy)
  • How would you design a Shop Ads ranking algorithm? - Meta (easy)
  • Derive Linear Regression Solution - Meta (medium)
  • Explain key ML metrics and techniques - Meta (medium)
Meta logo
Meta
Aug 4, 2025, 10:55 AM
Data Scientist
Technical Screen
Machine Learning
2
0

Scenario

You are evaluating a new machine-learning model that detects harmful content on a large consumer platform. Leadership needs evidence that the new model outperforms the existing model, and to understand trade-offs between catching more harmful posts and avoiding over-removal of benign posts.

Task

Design a comprehensive evaluation plan to compare the new model against:

  • the existing (production) model, and
  • optionally, a minimal/no-model baseline (only if safe via safeguards).

Address both:

  1. Offline evaluation using labeled data and confusion-matrix-based metrics.
  2. Online A/B testing and experiment design.

Make the precision–recall trade-offs explicit, and connect model metrics to business outcomes.

Requirements

  • Define key metrics: precision, recall, F1/Fβ, ROC-AUC, PR-AUC, calibration (Brier score, reliability), threshold-specific metrics.
  • Describe dataset design, label quality, and class imbalance handling.
  • Propose thresholding/triage policies (e.g., auto-remove vs. send-to-review).
  • Outline online experiment design: unit of randomization, triggers, guardrails, primary/secondary KPIs, safety and ethical constraints.
  • Include validation steps, power/duration considerations, and guardrails for false positives and user impact.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Meta•More Data Scientist•Meta Data Scientist•Meta Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.