Design leakage-free predictive maintenance pipeline

Q: Design leakage-free predictive maintenance pipeline

This question evaluates a data scientist's competency in designing time-series predictive maintenance pipelines, focusing on temporal feature engineering, leakage prevention and point-in-time joins, handling late-arriving labels, class imbalance and cost-sensitive thresholding, probabilistic calibration, explainability, and operational drift monitoring. It is commonly asked in the Machine Learning domain to assess an applicant's ability to produce an end-to-end, production-ready workflow that balances practical implementation concerns with conceptual system-design reasoning, so the task is primarily practical application with important conceptual elements.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Predict 24-hour Machine Faults from an Hourly Panel (End-to-End Design)

Context

You are given a machine–hour panel: one row per machine per hour with sensor readings and events. At each hour t, the goal is to predict whether that machine will experience a fault within the next 24 hours.

Assume the panel has, at minimum:

machine_id, ts_hour (UTC, truncated to hour)
Sensor features (e.g., temp, vibration, current), counters, and binary event flags
Fault events with two timestamps: event_time (when it happened) and arrival_time (when the event was written/available)

Define the label for each (machine_id, t): y_t = 1 if any fault occurs in (t, t + 24h], else 0.

Requirements

Prevent leakage
- Features may use only data available at time t
- Account for late-arriving events (event_time vs arrival_time)
- Describe a feature-store strategy (backfills, point-in-time joins)
Time-based cross-validation
- Specify at least three expanding-window splits with explicit cutoffs (e.g., train ≤2025-06-30, validate 2025-07, test 2025-08)
Class imbalance (~1% positives)
- Choose metrics (e.g., AUCPR)
- Compare class_weight vs focal loss
- Select a decision threshold that minimizes expected cost given FN= $10,000 and FP=$ 500
Calibration and explainability
- Calibrate probabilities (Platt or isotonic)
- Compute permutation importance
- Discuss SHAP caveats under multicollinearity and time leakage
Robustness
- Handle missing sensors, outliers, and drift
- Specify drift monitors (PSI/KS), backtesting, and a retraining cadence
Provide high-level pseudocode covering data split, training, calibration, thresholding, and evaluation, and justify key design choices.

Design leakage-free predictive maintenance pipeline

Predict 24-hour Machine Faults from an Hourly Panel (End-to-End Design)

Context

Requirements

Solution

Comments (0)

Design leakage-free predictive maintenance pipeline

Overview

Predict 24-hour Machine Faults from an Hourly Panel (End-to-End Design)

Context

Requirements

Solution

Comments (0)