PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/Machine Learning/Meta

Detect leakage and evaluate a prediction model

Last updated: Mar 29, 2026

Quick Overview

This question evaluates competence in production-ready supervised machine learning—detecting and repairing data leakage in logs-based features, designing time-aware cross-validation, defining KPI-driven thresholding and calibration, interpreting model coefficients, and establishing monitoring and retraining policies for a churn prediction model in the Machine Learning domain. It is commonly asked because interviewers need to assess both conceptual understanding of leakage, validation, and calibration principles and practical application skills for deploying and maintaining models with cost-sensitive objectives and monitoring in production, so the level of abstraction spans conceptual reasoning and hands-on practical application.

  • hard
  • Meta
  • Machine Learning
  • Data Scientist

Detect leakage and evaluate a prediction model

Company: Meta

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Onsite

You inherit a churn prediction model trained to predict whether a user will place an order in the next 28 days. The training data include features computed weekly, some of which leak post-label information (e.g., features that include activity within the prediction window). (a) Identify at least five concrete leakage sources in a logs-based feature set (examples: features derived from orders within the label window, coupon_applied in the next 28 days, post-treatment delivery ETA, label-influenced support contacts). For each, rewrite the feature so it is computable at prediction time with a strict event-time cutoff. (b) Propose a time-based cross-validation scheme (rolling origin) and define train/validation/test splits ensuring no future leakage. Specify how you would handle users entering/leaving the cohort and cold-start users. (c) Offline metrics show AUC=0.79, PR-AUC=0.23. Online, targeting the top decile increases conversion by 1.5% but raises cancellations by 0.3pp. Define business KPIs and a cost-sensitive objective to tune the decision threshold; include calibration (Platt/Isotonic) and how you would check calibration drift in production. (d) Interpretation: a standardized feature past_7d_orders has logistic regression coefficient 0.40 and baseline log-odds of churn −1.50. Compute the odds ratio for a +1 SD increase and the resulting churn probability change from the baseline. Explain limitations of such ceteris paribus interpretations in correlated settings. (e) Outline a monitoring plan (data quality, feature distributions, PSI, label delay, prediction drift) and a retraining policy tied to performance and covariate shift triggers.

Quick Answer: This question evaluates competence in production-ready supervised machine learning—detecting and repairing data leakage in logs-based features, designing time-aware cross-validation, defining KPI-driven thresholding and calibration, interpreting model coefficients, and establishing monitoring and retraining policies for a churn prediction model in the Machine Learning domain. It is commonly asked because interviewers need to assess both conceptual understanding of leakage, validation, and calibration principles and practical application skills for deploying and maintaining models with cost-sensitive objectives and monitoring in production, so the level of abstraction spans conceptual reasoning and hands-on practical application.

Related Interview Questions

  • Design and evaluate an ads ranking algorithm - Meta (easy)
  • How would you design a Shop Ads ranking algorithm? - Meta (easy)
  • Derive Linear Regression Solution - Meta (medium)
  • Explain key ML metrics and techniques - Meta (medium)
  • Design an ad recommendation ranking approach - Meta (easy)
Meta logo
Meta
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Machine Learning
3
0

Churn Prediction Model: Leakage, Validation, KPIs, Interpretation, Monitoring

Context: You inherit a weekly-scored model that predicts whether a user will place an order in the next 28 days. Some features were built from logs in ways that leak information from the post-prediction label window. Address the following tasks.

(a) Leakage identification and repair

  • Identify at least five concrete leakage sources that can occur in a logs-based feature set (e.g., features derived from orders within the label window, coupon_applied in the next 28 days, post-treatment delivery ETA, label-influenced support contacts).
  • For each, rewrite the feature so it is computable at prediction time with a strict event-time cutoff.

(b) Time-based cross-validation (rolling origin)

  • Propose a rolling origin cross-validation scheme and define train/validation/test splits that ensure no future leakage.
  • Specify how to handle users entering/leaving the cohort across time and how to handle cold-start users.

(c) KPIs, thresholding, and calibration

  • Offline metrics are AUC = 0.79 and PR-AUC = 0.23. Online, targeting the top decile increases conversion by 1.5% but raises cancellations by 0.3pp.
  • Define business KPIs and a cost-sensitive objective to tune the decision threshold.
  • Include probability calibration (Platt or Isotonic) and how you would check calibration drift in production.

(d) Interpretation

  • A standardized feature past_7d_orders has a logistic regression coefficient of 0.40 and a baseline log-odds of churn of −1.50.
  • Compute the odds ratio for a +1 SD increase and the resulting change in churn probability from the baseline.
  • Explain the limitations of such ceteris paribus interpretations in correlated feature settings.

(e) Monitoring and retraining

  • Outline a monitoring plan (data quality, feature distributions, PSI, label delay, prediction drift) and a retraining policy tied to performance and covariate shift triggers.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Meta•More Data Scientist•Meta Data Scientist•Meta Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.