PracHub
QuestionsPremiumLearningGuidesInterview PrepCoaches
|Home/Machine Learning/Roblox

Design leakage-free predictive maintenance pipeline

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a data scientist's competency in designing time-series predictive maintenance pipelines, focusing on temporal feature engineering, leakage prevention and point-in-time joins, handling late-arriving labels, class imbalance and cost-sensitive thresholding, probabilistic calibration, explainability, and operational drift monitoring. It is commonly asked in the Machine Learning domain to assess an applicant's ability to produce an end-to-end, production-ready workflow that balances practical implementation concerns with conceptual system-design reasoning, so the task is primarily practical application with important conceptual elements.

  • hard
  • Roblox
  • Machine Learning
  • Data Scientist

Design leakage-free predictive maintenance pipeline

Company: Roblox

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Take-home Project

Using the machine-hour panel from the previous question, design an end-to-end model to predict whether a machine will experience a 'fault' within the next 24 hours at each hour t. Requirements: (1) Prevent leakage: features may use only data available at time t; account for late-arriving events and describe a feature-store strategy (e.g., backfills and point-in-time joins). (2) Time-based CV: specify at least three expanding-window splits with explicit cutoffs (e.g., train ≤2025-06-30, validate 2025-07, test 2025-08). (3) Class imbalance ~1% positives: choose metrics (e.g., AUCPR), compare class_weight vs focal loss, and select a decision threshold that minimizes expected cost given FN=$10,000 and FP=$500. (4) Calibrate probabilities (Platt or isotonic), compute permutation importance; discuss SHAP caveats under multicollinearity and time leakage. (5) Robustness: handle missing sensors, outliers, and drift; specify drift monitors (PSI/KS), backtesting, and a retraining cadence. Provide high-level pseudocode (data split, training, calibration, thresholding, evaluation) and justify key design choices.

Quick Answer: This question evaluates a data scientist's competency in designing time-series predictive maintenance pipelines, focusing on temporal feature engineering, leakage prevention and point-in-time joins, handling late-arriving labels, class imbalance and cost-sensitive thresholding, probabilistic calibration, explainability, and operational drift monitoring. It is commonly asked in the Machine Learning domain to assess an applicant's ability to produce an end-to-end, production-ready workflow that balances practical implementation concerns with conceptual system-design reasoning, so the task is primarily practical application with important conceptual elements.

Related Interview Questions

  • Normalize features and rank logistic coefficients - Roblox (hard)
  • Fit logistic regression and return top features - Roblox (hard)
  • Explain an ML project end-to-end with tradeoffs - Roblox (Medium)
  • Design real-time payments fraud model under constraints - Roblox (hard)
  • Rank features using logistic regression coefficients - Roblox (easy)
Roblox logo
Roblox
Oct 13, 2025, 9:49 PM
Data Scientist
Take-home Project
Machine Learning
4
0

Predict 24-hour Machine Faults from an Hourly Panel (End-to-End Design)

Context

You are given a machine–hour panel: one row per machine per hour with sensor readings and events. At each hour t, the goal is to predict whether that machine will experience a fault within the next 24 hours.

Assume the panel has, at minimum:

  • machine_id, ts_hour (UTC, truncated to hour)
  • Sensor features (e.g., temp, vibration, current), counters, and binary event flags
  • Fault events with two timestamps: event_time (when it happened) and arrival_time (when the event was written/available)

Define the label for each (machine_id, t): y_t = 1 if any fault occurs in (t, t + 24h], else 0.

Requirements

  1. Prevent leakage
    • Features may use only data available at time t
    • Account for late-arriving events (event_time vs arrival_time)
    • Describe a feature-store strategy (backfills, point-in-time joins)
  2. Time-based cross-validation
    • Specify at least three expanding-window splits with explicit cutoffs (e.g., train ≤2025-06-30, validate 2025-07, test 2025-08)
  3. Class imbalance (~1% positives)
    • Choose metrics (e.g., AUCPR)
    • Compare class_weight vs focal loss
    • Select a decision threshold that minimizes expected cost given FN= 10,000andFP=10,000 and FP=10,000andFP= 500
  4. Calibration and explainability
    • Calibrate probabilities (Platt or isotonic)
    • Compute permutation importance
    • Discuss SHAP caveats under multicollinearity and time leakage
  5. Robustness
    • Handle missing sensors, outliers, and drift
    • Specify drift monitors (PSI/KS), backtesting, and a retraining cadence
  6. Provide high-level pseudocode covering data split, training, calibration, thresholding, and evaluation, and justify key design choices.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Roblox•More Data Scientist•Roblox Data Scientist•Roblox Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.