Explain Core ML Concepts
Company: J.P. Morgan
Role: Data Scientist
Category: Machine Learning
Difficulty: medium
Interview Round: Technical Screen
You are interviewing for a senior AI/ML-oriented data science role at a financial institution. Answer the following foundational machine learning questions clearly and with enough technical depth for production modeling contexts such as credit risk, fraud detection, customer churn, or transaction classification.
1. Compare bagging and boosting.
- What problem does each method try to solve?
- Give examples of algorithms that use each approach.
- How do they affect bias and variance?
2. Explain the bias-variance tradeoff.
- What does high bias look like?
- What does high variance look like?
- How would you diagnose each using training and validation performance?
3. Describe methods to reduce model variance.
- Include regularization, cross-validation, model averaging, early stopping, pruning, and data-related approaches.
- Explain the differences between L1 and L2 regularization.
- Discuss when L1 may be preferred over L2.
4. Explain feature selection.
- Compare filter, wrapper, and embedded methods.
- Discuss how to avoid data leakage during feature selection.
- Explain how feature selection differs for linear models, tree-based models, and deep learning models.
5. Compare Transformers and RNNs.
- Why did Transformers largely replace RNNs for many sequence modeling tasks?
- Explain the attention mechanism at a high level and with the query-key-value formulation.
- Discuss computational tradeoffs, sequence length limitations, and interpretability caveats.
Quick Answer: This question evaluates foundational machine learning competencies including ensemble methods (bagging vs boosting), the bias–variance tradeoff, variance-reduction and regularization techniques, feature selection and leakage prevention, and sequence modeling contrasts between Transformers and RNNs, framed for production modeling contexts such as credit risk, fraud detection, churn prediction, and transaction classification. It is commonly asked in technical interviews to assess both conceptual understanding and practical application skills for designing robust, generalizable models and reasoning about trade-offs and implementation choices within the Machine Learning/Data Science domain.