Model y from x and interpret distributions
Company: Reddit
Role: Machine Learning Engineer
Category: Machine Learning
Difficulty: medium
Interview Round: Onsite
## Scenario
You are given a dataset with one input feature **x** and a target **y**. The interviewer asks: “How would you model this?”
Later, you are shown a plot with **two distributions** (e.g., distribution of a feature for two groups/classes, or train vs. production) and asked to interpret what it implies.
Finally, you are asked several **cold-start** questions.
## Tasks
1. Explain how you decide whether this is **regression vs classification**, what baseline models you try first, and what evaluation metrics you use.
2. Given a plot with two distributions, explain how you would:
- Describe what you see (separation/overlap, shift, variance, multimodality)
- Diagnose potential issues (label leakage, covariate shift, class imbalance, thresholding)
- Decide next steps (feature engineering, calibration, sampling, monitoring)
3. Describe practical **cold start** strategies for:
- New users
- New items (videos)
- New regions/languages
Assume you care about both predictive quality and production robustness.
Quick Answer: This question evaluates modeling judgment in machine learning, covering problem framing (regression versus classification), baseline model and metric selection, interpretation of feature or class distribution differences, and cold-start strategies for users, items, and regions.