Diagnose Flight Delays and Burger Launch
Company: Capital One
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: easy
Interview Round: Onsite
You are given two analytics case questions.
**Part A: Flight delay analysis**
An airline wants to understand and predict delays. You receive a flight-level dataset with these columns:
- `flight_id` STRING
- `airport_id` STRING
- `aircraft_type` STRING
- `seats` INT, but some rows contain negative values
- `gate_attendants` INT
- `ground_attendants` INT
- `passenger_count` INT
- `weather_score` FLOAT
- `day_of_week` STRING
- `delay_minutes` FLOAT
Some rows have missing values. Exploratory charts suggest that delays are higher on Mondays and Fridays, more attendants are associated with lower delays, airports behave differently, and `gate_attendants`, `ground_attendants`, and `passenger_count` are highly correlated.
How would you:
1. audit and clean the data,
2. decide whether to model `delay_minutes` as a regression target or convert the problem to a binary `is_delayed` classification task,
3. handle multicollinearity and airport or aircraft heterogeneity,
4. interpret a low R-squared,
5. separate predictive conclusions from causal conclusions?
**Part B: Vegan burger launch**
A restaurant chain is deciding whether to introduce a vegan burger. You are given a one-time launch cost, recurring fixed cost, vegan burger unit cost, vegan burger unit price, and the current unit economics of the regular burger. Management believes that after launch, the sales mix would be `vegan : regular = 2 : 3`.
Using the interviewer-provided numbers, the algebra implies that total burger volume would need to increase by about 60% to match current profit.
How would you:
1. set up the profit equation and break-even condition,
2. make an initial recommendation,
3. explain how your original decision could still have been reasonable if a competitor later launched a vegan burger successfully?
Quick Answer: This question evaluates a data scientist's skills in data auditing and cleaning, feature engineering and modeling decisions (regression versus classification), handling multicollinearity and hierarchical heterogeneity, interpreting low model fit and separating predictive from causal conclusions, and constructing financial break-even and sensitivity analyses for a product launch. Commonly asked in analytics & experimentation interviews, it tests statistical modeling, causal inference, and business-analytics reasoning across both conceptual understanding and practical application, requiring reasoning about noisy real-world data, modeling trade-offs, and quantitative business impact.