Model flight delays with EDA and explanation

Q: Model flight delays with EDA and explanation

This question evaluates a data scientist's competency in time-aware predictive modeling, exploratory data analysis for leakage and target drift detection, temporal cross-validation design, feature engineering and handling of rare categories and class imbalance, model selection (linear and tree-based), explainability and robustness testing, and deployment/experiment specification within the Machine Learning / Data Science domain. It is commonly asked because it probes practical application of production-ready ML workflows on temporally ordered data—assessing conceptual understanding of data leakage, drift, and validation alongside practical skills for metric selection, thresholding, inference contracts and operational reliability, so the level of abstraction spans both practical application and conceptual understanding.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Predicting 15+ Minute Arrival Delays at Scheduled-Departure Time

You are building a binary classifier that predicts whether a domestic flight will arrive 15+ minutes late (late15 = 1 if arr_delay_min ≥ 15, else 0), using only information available by scheduled departure time.

You receive a 50M-row table with these columns (one row per scheduled flight):

flight_date (YYYY-MM-DD)
carrier
dep_airport, arr_airport
sched_dep_time (HH:MM local)
dep_delay_min, arr_delay_min
distance_miles
weather_dep_* (temp, precip, vis)
weather_arr_* (temp, precip, vis)
holiday_flag
aircraft_tail
route_id
Label: late15

Assume we must restrict features to those known by scheduled departure and align any aggregates/forecasts accordingly.

Tasks

EDA and Leakage/Drift Checks

List the exact checks/plots to detect:
- Data leakage
- Target drift over time
- Rare-category issues
Name at least 3 concrete leakage risks present in these columns and how to mitigate each (e.g., remove/lag features, use only pre-departure weather, exclude realized delays).

Validation Design

Propose a time-based cross-validation scheme that respects seasonality and avoids look-ahead.
Specify precise train/validation/test date windows and justify them.

Modeling Plan

Propose two candidate models: one linear and one tree-based.
Feature engineering: cyclical encodings for time-of-day, airport/carrier rolling aggregates, and weather joins.
Handling class imbalance.
Primary metric(s).
How to choose and calibrate a decision threshold for operational use.

Explainability & Robustness

How you would use SHAP/partial dependence safely with time-ordered data.
How you would test stability across airports/carriers, including at least two specific stress tests (e.g., out-of-sample storms, new routes).

Deployment

Define an inference contract (latency/SLAs, feature freshness, failure modes).
Outline one A/B test to verify operational value (e.g., proactive rebooking or gate assignment), including success metrics and guardrails.

Model flight delays with EDA and explanation

Predicting 15+ Minute Arrival Delays at Scheduled-Departure Time

Tasks

Solution

Comments (0)

Model flight delays with EDA and explanation

Overview

Predicting 15+ Minute Arrival Delays at Scheduled-Departure Time

Tasks

Solution

Comments (0)