PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Machine Learning/EvenUp

Build a model to predict wine quality

Last updated: Mar 29, 2026

Quick Overview

This question evaluates predictive modeling and data-science competencies—exploratory data analysis, feature assessment, problem framing (regression vs classification vs ordinal), model selection and validation, evaluation metric choice, and post-model feature-importance interpretation—in the Machine Learning domain, combining conceptual understanding with practical application. It is commonly asked in technical interviews for Data Scientist roles because it tests end-to-end modeling judgment, reasoning about data distributions and variable usefulness, selection of validation and evaluation strategies, and awareness of pitfalls such as collinearity and data leakage.

  • Medium
  • EvenUp
  • Machine Learning
  • Data Scientist

Build a model to predict wine quality

Company: EvenUp

Role: Data Scientist

Category: Machine Learning

Difficulty: Medium

Interview Round: Technical Screen

## Modeling task: Predict wine quality from a CSV You are given a clean CSV dataset about red wine. The target (dependent) variable is: - `quality` (integer): wine quality score on a **1–7** scale. There are ~10 input (independent) variables describing the wine’s chemical properties (all numeric), e.g.: - `fixed_acidity` (float) - `volatile_acidity` (float) - `citric_acid` (float) - `residual_sugar` (float) - `chlorides` (float) - `free_sulfur_dioxide` (float) - `total_sulfur_dioxide` (float) - `density` (float) - `pH` (float) - `sulphates` (float) - `alcohol` (float) Assume: - There are **no missing values**. - Each row is one wine sample; samples are i.i.d. (unless you discover evidence otherwise). ### Questions 1. **EDA:** What do you learn from exploring the dataset (distributions, outliers, correlations, target imbalance, non-linearities)? List at least 3 concrete findings and how they affect modeling choices. 2. **Feature usefulness (pre-model):** Which variables appear likely to be useful for predicting `quality`, and why? Mention at least two different ways to assess this (e.g., correlation, mutual information, monotonic trends, domain reasoning). 3. **Modeling:** Build a model to predict `quality`. You may choose any approach. Clearly specify: - whether you treat the task as **regression**, **classification**, or **ordinal classification**, and why - train/validation strategy (e.g., split or cross-validation) - evaluation metric(s) 4. **Feature importance (post-model):** How would you determine which variables are actually useful in your final model? Provide a method appropriate to your model choice and explain pitfalls (e.g., collinearity, leakage, bias in impurity-based importances). Deliverable: a brief write-up of your approach and results; optionally include pseudocode / a code outline in Python (pandas + scikit-learn).

Quick Answer: This question evaluates predictive modeling and data-science competencies—exploratory data analysis, feature assessment, problem framing (regression vs classification vs ordinal), model selection and validation, evaluation metric choice, and post-model feature-importance interpretation—in the Machine Learning domain, combining conceptual understanding with practical application. It is commonly asked in technical interviews for Data Scientist roles because it tests end-to-end modeling judgment, reasoning about data distributions and variable usefulness, selection of validation and evaluation strategies, and awareness of pitfalls such as collinearity and data leakage.

EvenUp logo
EvenUp
Sep 24, 2025, 12:00 AM
Data Scientist
Technical Screen
Machine Learning
7
0

Modeling task: Predict wine quality from a CSV

You are given a clean CSV dataset about red wine. The target (dependent) variable is:

  • quality (integer): wine quality score on a 1–7 scale.

There are ~10 input (independent) variables describing the wine’s chemical properties (all numeric), e.g.:

  • fixed_acidity (float)
  • volatile_acidity (float)
  • citric_acid (float)
  • residual_sugar (float)
  • chlorides (float)
  • free_sulfur_dioxide (float)
  • total_sulfur_dioxide (float)
  • density (float)
  • pH (float)
  • sulphates (float)
  • alcohol (float)

Assume:

  • There are no missing values .
  • Each row is one wine sample; samples are i.i.d. (unless you discover evidence otherwise).

Questions

  1. EDA: What do you learn from exploring the dataset (distributions, outliers, correlations, target imbalance, non-linearities)? List at least 3 concrete findings and how they affect modeling choices.
  2. Feature usefulness (pre-model): Which variables appear likely to be useful for predicting quality , and why? Mention at least two different ways to assess this (e.g., correlation, mutual information, monotonic trends, domain reasoning).
  3. Modeling: Build a model to predict quality . You may choose any approach. Clearly specify:
    • whether you treat the task as regression , classification , or ordinal classification , and why
    • train/validation strategy (e.g., split or cross-validation)
    • evaluation metric(s)
  4. Feature importance (post-model): How would you determine which variables are actually useful in your final model? Provide a method appropriate to your model choice and explain pitfalls (e.g., collinearity, leakage, bias in impurity-based importances).

Deliverable: a brief write-up of your approach and results; optionally include pseudocode / a code outline in Python (pandas + scikit-learn).

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More EvenUp•More Data Scientist•EvenUp Data Scientist•EvenUp Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.