This question evaluates understanding of feature scaling, interpretation of logistic regression coefficients as feature importance, and awareness of how regularization and tie-breaking can affect coefficient-based rankings.
You are given a binary classification dataset:
X
: a 2D array of shape
(n_samples, n_features)
containing numeric features
y
: a 1D binary array of shape
(n_samples,)
with values in {0,1}
feature_names
: a list of length
n_features
with the name of each column in
X
X
using z-score standardization:
where and are the mean and standard deviation of feature computed on the training set.
Return a list of 3 strings: the names of the top-3 features.
Login required