PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Flatiron Health

Build a robust ML pipeline

Last updated: Jun 14, 2026

Quick Overview

This question evaluates a candidate's competency in designing robust end-to-end ML pipelines, covering temporal data slicing, leakage controls, time-series cross-validation, feature store consistency, offline and business metrics, drift detection and monitoring, online rollout strategies, retraining triggers, and fairness assessment.

  • Medium
  • Flatiron Health
  • Machine Learning
  • Data Scientist

Build a robust ML pipeline

Company: Flatiron Health

Role: Data Scientist

Category: Machine Learning

Difficulty: Medium

Interview Round: Technical Screen

You inherit an ML pipeline that predicts next-7-day churn for users, but data quality is inconsistent and feature drift is suspected. A) Propose an end-to-end pipeline design covering: temporal data slicing (label window vs feature window), leakage controls (e.g., using only information available up to prediction time), cross-validation scheme appropriate for time series, and a feature store strategy that guarantees training/serving consistency. B) Define offline metrics (e.g., AUC, PR-AUC, calibration error) and business metrics (e.g., uplift in retention from targeted interventions). Specify how you would threshold scores to optimize a cost-sensitive objective with asymmetric costs. C) Describe concrete data quality and drift monitors: missingness rates, schema checks, training-serving skew, and feature drift using PSI/JS divergence with alert thresholds (e.g., PSI > 0.25 severe). Include how to separate drift in covariates from drift in the target due to product changes. D) Detail an online rollout plan: canary scoring, shadow mode, real-time monitoring, rollback triggers, and retraining cadence. Define explicit retraining triggers (e.g., weekly if PSI moderate for two consecutive weeks or business KPI degrades by X%). Address fairness checks across at least two sensitive cohorts and how you would mitigate disparities.

Quick Answer: This question evaluates a candidate's competency in designing robust end-to-end ML pipelines, covering temporal data slicing, leakage controls, time-series cross-validation, feature store consistency, offline and business metrics, drift detection and monitoring, online rollout strategies, retraining triggers, and fairness assessment.

Flatiron Health logo
Flatiron Health
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Machine Learning
6
0

You inherit an ML pipeline that predicts next-7-day churn for users, but data quality is inconsistent and feature drift is suspected. A) Propose an end-to-end pipeline design covering: temporal data slicing (label window vs feature window), leakage controls (e.g., using only information available up to prediction time), cross-validation scheme appropriate for time series, and a feature store strategy that guarantees training/serving consistency. B) Define offline metrics (e.g., AUC, PR-AUC, calibration error) and business metrics (e.g., uplift in retention from targeted interventions). Specify how you would threshold scores to optimize a cost-sensitive objective with asymmetric costs. C) Describe concrete data quality and drift monitors: missingness rates, schema checks, training-serving skew, and feature drift using PSI/JS divergence with alert thresholds (e.g., PSI > 0.25 severe). Include how to separate drift in covariates from drift in the target due to product changes. D) Detail an online rollout plan: canary scoring, shadow mode, real-time monitoring, rollback triggers, and retraining cadence. Define explicit retraining triggers (e.g., weekly if PSI moderate for two consecutive weeks or business KPI degrades by X%). Address fairness checks across at least two sensitive cohorts and how you would mitigate disparities.

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Flatiron Health•More Data Scientist•Flatiron Health Data Scientist•Flatiron Health Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.