PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Machine Learning/Reddit

Model y from x and interpret distributions

Last updated: Mar 29, 2026

Quick Overview

This question evaluates modeling judgment in machine learning, covering problem framing (regression versus classification), baseline model and metric selection, interpretation of feature or class distribution differences, and cold-start strategies for users, items, and regions.

  • medium
  • Reddit
  • Machine Learning
  • Machine Learning Engineer

Model y from x and interpret distributions

Company: Reddit

Role: Machine Learning Engineer

Category: Machine Learning

Difficulty: medium

Interview Round: Onsite

## Scenario You are given a dataset with one input feature **x** and a target **y**. The interviewer asks: “How would you model this?” Later, you are shown a plot with **two distributions** (e.g., distribution of a feature for two groups/classes, or train vs. production) and asked to interpret what it implies. Finally, you are asked several **cold-start** questions. ## Tasks 1. Explain how you decide whether this is **regression vs classification**, what baseline models you try first, and what evaluation metrics you use. 2. Given a plot with two distributions, explain how you would: - Describe what you see (separation/overlap, shift, variance, multimodality) - Diagnose potential issues (label leakage, covariate shift, class imbalance, thresholding) - Decide next steps (feature engineering, calibration, sampling, monitoring) 3. Describe practical **cold start** strategies for: - New users - New items (videos) - New regions/languages Assume you care about both predictive quality and production robustness.

Quick Answer: This question evaluates modeling judgment in machine learning, covering problem framing (regression versus classification), baseline model and metric selection, interpretation of feature or class distribution differences, and cold-start strategies for users, items, and regions.

Related Interview Questions

  • Analyze CTR Data and Train Model - Reddit (medium)
  • Build and evaluate click prediction models - Reddit (medium)
Reddit logo
Reddit
Feb 12, 2026, 12:00 AM
Machine Learning Engineer
Onsite
Machine Learning
7
0

Scenario

You are given a dataset with one input feature x and a target y. The interviewer asks: “How would you model this?”

Later, you are shown a plot with two distributions (e.g., distribution of a feature for two groups/classes, or train vs. production) and asked to interpret what it implies.

Finally, you are asked several cold-start questions.

Tasks

  1. Explain how you decide whether this is regression vs classification , what baseline models you try first, and what evaluation metrics you use.
  2. Given a plot with two distributions, explain how you would:
    • Describe what you see (separation/overlap, shift, variance, multimodality)
    • Diagnose potential issues (label leakage, covariate shift, class imbalance, thresholding)
    • Decide next steps (feature engineering, calibration, sampling, monitoring)
  3. Describe practical cold start strategies for:
    • New users
    • New items (videos)
    • New regions/languages

Assume you care about both predictive quality and production robustness.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Reddit•More Machine Learning Engineer•Reddit Machine Learning Engineer•Reddit Machine Learning•Machine Learning Engineer Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.