Build a model to predict wine quality

Q: Build a model to predict wine quality

This is a Machine Learning interview question from EvenUp for Data Scientist roles. View the full question and solution on PracHub.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Modeling task: Predict wine quality from a CSV

You are given a clean CSV dataset about red wine. The target (dependent) variable is:

quality (integer): wine quality score on a 1–7 scale.

There are ~10 input (independent) variables describing the wine’s chemical properties (all numeric), e.g.:

fixed_acidity (float)
volatile_acidity (float)
citric_acid (float)
residual_sugar (float)
chlorides (float)
free_sulfur_dioxide (float)
total_sulfur_dioxide (float)
density (float)
pH (float)
sulphates (float)
alcohol (float)

Assume:

There are no missing values .
Each row is one wine sample; samples are i.i.d. (unless you discover evidence otherwise).

Questions

EDA: What do you learn from exploring the dataset (distributions, outliers, correlations, target imbalance, non-linearities)? List at least 3 concrete findings and how they affect modeling choices.
Feature usefulness (pre-model): Which variables appear likely to be useful for predicting quality , and why? Mention at least two different ways to assess this (e.g., correlation, mutual information, monotonic trends, domain reasoning).
Modeling: Build a model to predict quality . You may choose any approach. Clearly specify:
- whether you treat the task as regression , classification , or ordinal classification , and why
- train/validation strategy (e.g., split or cross-validation)
- evaluation metric(s)
Feature importance (post-model): How would you determine which variables are actually useful in your final model? Provide a method appropriate to your model choice and explain pitfalls (e.g., collinearity, leakage, bias in impurity-based importances).

Deliverable: a brief write-up of your approach and results; optionally include pseudocode / a code outline in Python (pandas + scikit-learn).

Build a model to predict wine quality

Modeling task: Predict wine quality from a CSV

Questions

Solution

Comments (0)