Design leakage-free predictive maintenance pipeline
Company: Roblox
Role: Data Scientist
Category: Machine Learning
Difficulty: hard
Interview Round: Take-home Project
Using the machine-hour panel from the previous question, design an end-to-end model to predict whether a machine will experience a 'fault' within the next 24 hours at each hour t. Requirements: (1) Prevent leakage: features may use only data available at time t; account for late-arriving events and describe a feature-store strategy (e.g., backfills and point-in-time joins). (2) Time-based CV: specify at least three expanding-window splits with explicit cutoffs (e.g., train ≤2025-06-30, validate 2025-07, test 2025-08). (3) Class imbalance ~1% positives: choose metrics (e.g., AUCPR), compare class_weight vs focal loss, and select a decision threshold that minimizes expected cost given FN=$10,000 and FP=$500. (4) Calibrate probabilities (Platt or isotonic), compute permutation importance; discuss SHAP caveats under multicollinearity and time leakage. (5) Robustness: handle missing sensors, outliers, and drift; specify drift monitors (PSI/KS), backtesting, and a retraining cadence. Provide high-level pseudocode (data split, training, calibration, thresholding, evaluation) and justify key design choices.
Quick Answer: This question evaluates a data scientist's competency in designing time-series predictive maintenance pipelines, focusing on temporal feature engineering, leakage prevention and point-in-time joins, handling late-arriving labels, class imbalance and cost-sensitive thresholding, probabilistic calibration, explainability, and operational drift monitoring. It is commonly asked in the Machine Learning domain to assess an applicant's ability to produce an end-to-end, production-ready workflow that balances practical implementation concerns with conceptual system-design reasoning, so the task is primarily practical application with important conceptual elements.