Implement and Tune KNN Classifier
Company: Qube
Role: Data Scientist
Category: Machine Learning
Difficulty: hard
Interview Round: Take-home Project
You are given two CSV files for a three-class leaf-classification task.
- `train.csv` contains three numeric feature columns: `feature_0`, `feature_1`, and `feature_2`, plus a class label column.
- `unlabeled.csv` contains the same three feature columns but no labels.
Complete the task in a Jupyter notebook:
1. Load and inspect the labeled dataset.
2. Split the labeled data into training and testing sets.
3. Implement a K-nearest neighbors classifier from scratch. You may use standard data-processing libraries, but the KNN prediction logic should be your own implementation.
4. Train the classifier and evaluate its accuracy on the held-out test set.
5. Tune hyperparameters, especially the value of `k`, and try to improve accuracy. You may use cross-validation.
6. Use the final model to predict labels for `unlabeled.csv`.
7. Write the predictions to an output CSV file.
Quick Answer: This question evaluates a candidate's competency in implementing and tuning the K-nearest neighbors algorithm, covering data preprocessing, distance-based classification, model evaluation, and hyperparameter selection.