Build and evaluate airline delay prediction model

Q: Build and evaluate airline delay prediction model

This is a Machine Learning interview question from Capital One for Data Scientist roles. View the full question and solution on PracHub.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

You are given several CSVs for the classic airline delay challenge with columns like flight_date, carrier, flight_num, origin, dest, sched_dep, sched_arr, dep_delay_min, arr_delay_min, distance, aircraft_type, weather_features_*, and holiday_flag. a) Define a binary target and justify it: e.g., late_arrival = arr_delay_min > 15. b) Detail a leakage-aware feature set: include weather forecasts at origin/dest, route history aggregates up to t−7 days, time-of-day, day-of-week, month, distance, carrier- and airport-level rolling stats; exclude or properly lag any features that encode future information (e.g., actual arrival times). c) Specify a time-based split (e.g., train up to 2024-06, validate 2024-07–2024-09, test 2024-10–2025-03), class imbalance handling, and primary metrics (PR-AUC, calibrated Brier). d) Compare a strong baseline (regularized logistic regression with target encoding) versus gradient boosting (e.g., XGBoost/LightGBM): hyperparameters to search, early stopping, monotonic constraints if used. e) Explain how you would do rolling-origin cross-validation and backtesting of threshold policies (e.g., proactive swaps or buffers) with cost-sensitive evaluation that prices false negatives at 5× false positives. f) Productionization: 20 ms/flight latency budget, 50 MB model size, feature store vs on-the-fly aggregation, drift detection, and periodic retraining cadence. g) Deliverables: reproducible notebook, clean data pipeline, model cards with fairness slices across carriers/airports, and an exec summary with recommended operational policy and estimated ROI.

Build and evaluate airline delay prediction model

Comments (0)