You are given a clean CSV dataset about red wine. The target (dependent) variable is:
quality
(integer): wine quality score on a
1–7
scale.
There are ~10 input (independent) variables describing the wine’s chemical properties (all numeric), e.g.:
fixed_acidity
(float)
volatile_acidity
(float)
citric_acid
(float)
residual_sugar
(float)
chlorides
(float)
free_sulfur_dioxide
(float)
total_sulfur_dioxide
(float)
density
(float)
pH
(float)
sulphates
(float)
alcohol
(float)
Assume:
quality
, and why? Mention at least two different ways to assess this (e.g., correlation, mutual information, monotonic trends, domain reasoning).
quality
. You may choose any approach. Clearly specify:
Deliverable: a brief write-up of your approach and results; optionally include pseudocode / a code outline in Python (pandas + scikit-learn).
Login required