Explain core ML concepts and diagnostics
Company: Amazon
Role: Machine Learning Engineer
Category: Machine Learning
Difficulty: medium
Interview Round: Technical Screen
You are in an ML breadth interview for a Senior Applied Scientist role. Answer the following conceptual questions clearly and practically (definitions + when/why + common pitfalls):
1. What is a **p-value**? How should it (and should it not) be interpreted?
2. What are **overfitting** and **underfitting**? How can you diagnose and mitigate each?
3. What is **causal inference**? Name and briefly describe common methods.
4. In ML, what do **encoding** and **decoding** mean? Give concrete examples.
5. Explain **gradient descent** and **backpropagation** at a high level.
6. What are **vanishing/exploding gradients**? How do you mitigate them?
7. How do you handle **highly imbalanced data**?
8. Describe a scenario where you see **99% accuracy** but the model is still performing poorly. How would you fix/evaluate it properly?
9. What is an **A/B test**? If an A/B test result looks abnormal or suspicious, what might be the causes and how would you investigate?
Quick Answer: This question evaluates mastery of core Machine Learning concepts and diagnostics, covering statistical inference (p-values), overfitting/underfitting and bias-variance, causal inference approaches, encoding/decoding, optimization and backpropagation, gradient stability, handling imbalanced data, evaluation metrics versus accuracy, and experimental A/B testing. It is commonly asked in ML interviews to assess breadth and depth in the Machine Learning domain and probes both conceptual understanding and practical application by testing theoretical knowledge, recognition of common failure modes, and diagnostic reasoning used to validate models and experiments.