Choose Models for Imbalanced Data and Time-Series Forecasting
Scenario
You must choose and tune models for (a) forecasting marketplace demand with seasonality and trend, and (b) detecting fraud where the positive class rate is only 0.2%.
Tasks
-
Ordinary Least Squares (OLS): Explain how OLS linear regression works and list its key assumptions.
-
Tree Ensembles: Compare gradient-boosted trees, random forests, and bagging. When would you prefer each?
-
Class Imbalance (0.2% positive): How would you handle this imbalance during model training and evaluation?
-
Time-Series Forecasting Workflow: Describe a full, practical workflow for modeling a series with trend and seasonality, including preprocessing, feature engineering, appropriate metrics, and time-aware cross-validation.
Hint
Address data preprocessing, feature engineering, resampling/weighting, proper metrics for imbalance, and cross-validation suited for temporal data.
Constraints & Assumptions
-
Preserve the scope, facts, inputs, and requested outputs from the prompt above.
-
If the prompt leaves a detail unspecified, state a reasonable assumption before relying on it.
-
Keep the answer interview-ready: concise enough to present, but concrete enough to implement or evaluate.
Clarifying Questions to Ask
-
Clarify the task, data shape, labels, constraints, and evaluation metric.
-
State assumptions behind the math or modeling technique you choose.
-
Connect theory to practical training, debugging, and deployment implications.
What a Strong Answer Covers
-
Correct definitions and formulas where the prompt requires them.
-
A practical explanation of how the method behaves on real data.
-
Trade-offs, failure modes, diagnostics, and mitigation strategies.
-
Evaluation choices that match the product or modeling objective.
Follow-up Questions
-
How would noisy labels, class imbalance, or distribution shift affect the answer?
-
What would you monitor after deployment?
-
Which baseline would you compare against first?