You may be asked one or both of the following machine-learning case questions:
-
Flight-delay prediction case
An airline wants a model that predicts departure delay in minutes for each flight 2 hours before scheduled departure. You have historical flight operations data, airport congestion, aircraft and route information, weather forecasts, and crew or maintenance signals. Propose a regression-based approach and explain:
-
how you define the target and avoid label leakage;
-
which features you would engineer;
-
how you would split training and validation data over time;
-
which evaluation metrics you would use, such as MAE, RMSE, or quantile loss, and why;
-
how you would handle missing data, outliers, and highly correlated variables;
-
whether multicollinearity is harmful for prediction, interpretability, or both;
-
what threshold would make you call a correlation high, and why;
-
alternatives to dropping correlated features, such as regularization, feature clustering, PCA, or tree-based models;
-
if you remove a feature, how you would estimate that feature's business impact;
-
how you would turn model outputs into concrete operational recommendations for the airline.
Assume delays are right-skewed, severe delays are rare but costly, and airport-specific operational policies differ across hubs.
-
Watchlist face-recognition case
A bank wants to use branch camera feeds to flag whether an entering customer matches a watchlist of known robbers. Describe how you would design the model and decision system. Address:
-
closed-set versus open-set recognition;
-
data collection and labeling;
-
low base rates and class imbalance;
-
false-positive versus false-negative costs;
-
threshold selection, calibration, and human review;
-
fairness, privacy, consent, and legal risk;
-
latency and on-device versus server inference;
-
monitoring for drift, spoofing, and adversarial attacks.
For both cases, explain not only the modeling approach but also the business and ethical tradeoffs.