This question evaluates a data scientist's competency in designing time-series predictive maintenance pipelines, focusing on temporal feature engineering, leakage prevention and point-in-time joins, handling late-arriving labels, class imbalance and cost-sensitive thresholding, probabilistic calibration, explainability, and operational drift monitoring. It is commonly asked in the Machine Learning domain to assess an applicant's ability to produce an end-to-end, production-ready workflow that balances practical implementation concerns with conceptual system-design reasoning, so the task is primarily practical application with important conceptual elements.
You are given a machine–hour panel: one row per machine per hour with sensor readings and events. At each hour t, the goal is to predict whether that machine will experience a fault within the next 24 hours.
Assume the panel has, at minimum:
Define the label for each (machine_id, t): y_t = 1 if any fault occurs in (t, t + 24h], else 0.
Login required