Scenario
You are asked to use machine learning to predict stock prices (or more realistically, predict future returns / price direction) for a trading use case.
Questions
-
Target definition:
What exactly would you predict (e.g., next-day close, next-hour return, direction, volatility)? Why?
-
Data:
What data sources would you use (market data, fundamentals, news, alternative data)? What is the minimum viable dataset?
-
Features:
What features would you engineer from the data?
-
Modeling:
What model families would you consider and why (linear models, tree-based, deep learning, time-series models)?
-
Training & validation:
How would you split data over time to avoid leakage? How would you tune hyperparameters?
-
Evaluation:
What metrics would you use (ML metrics and trading metrics)?
-
Pitfalls:
How would you address non-stationarity, regime changes, data snooping, survivorship bias, and transaction costs?
-
Production considerations:
How would you deploy, monitor, and retrain the model?