Model flight delays with EDA and explanation
Company: Capital One
Role: Data Scientist
Category: Machine Learning
Difficulty: hard
Interview Round: Technical Screen
You are building a model to predict whether a domestic flight will arrive 15+ minutes late at wheels-down time, using only information available by scheduled departure. You receive a 50M-row table with columns: flight_date (YYYY-MM-DD), carrier, dep_airport, arr_airport, sched_dep_time (HH:MM local), dep_delay_min, arr_delay_min, distance_miles, weather_dep_* (temp, precip, vis), weather_arr_* (temp, precip, vis), holiday_flag, aircraft_tail, route_id. Label: late15 = 1 if arr_delay_min >= 15 else 0.
Tasks:
- EDA: list the exact checks/plots you would run to detect leakage, target drift, and rare-category issues; name at least 3 concrete leakage risks in these columns and how to mitigate each (e.g., removing/lagging features, using only pre-departure weather, excluding realized delays).
- Validation: design a time-based cross-validation that respects seasonality and avoids look-ahead. Specify precise train/validation/test date windows and why you chose them.
- Modeling: propose two candidates (one linear, one tree-based), feature engineering (cyclical encodings for time-of-day, airport- and carrier-level rolling aggregates, weather joins), handling class imbalance, primary metric(s), and how you would choose and calibrate a decision threshold for operational use.
- Explainability & robustness: describe how you'd use SHAP/partial dependence safely with time-ordered data and how you'd test stability across airports/carriers (include at least two specific stress tests such as out-of-sample storms or new routes).
- Deployment: define an inference contract (latency/SLAs, feature freshness, failure modes), and outline one A/B test to verify operational value (e.g., proactive rebooking or gate assignment); include success metrics and guardrails.
Quick Answer: This question evaluates a data scientist's competency in time-aware predictive modeling, exploratory data analysis for leakage and target drift detection, temporal cross-validation design, feature engineering and handling of rare categories and class imbalance, model selection (linear and tree-based), explainability and robustness testing, and deployment/experiment specification within the Machine Learning / Data Science domain. It is commonly asked because it probes practical application of production-ready ML workflows on temporally ordered data—assessing conceptual understanding of data leakage, drift, and validation alongside practical skills for metric selection, thresholding, inference contracts and operational reliability, so the level of abstraction spans both practical application and conceptual understanding.