PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates a candidate's understanding and practical implementation skills for instance-based supervised learning, specifically the k-nearest neighbors algorithm, covering distance metrics, neighbor selection, majority voting and tie-breaking, feature scaling effects, time complexity, and model selection considerations like k and weighting. It is commonly asked to assess both conceptual grasp of non-parametric classification and practical coding ability to implement and analyze algorithms, and it falls under coding & algorithms and machine learning domains with a mix of conceptual understanding and practical application.

  • medium
  • J.P. Morgan
  • Coding & Algorithms
  • Data Scientist

Implement KNN from Scratch

Company: J.P. Morgan

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

Implement a k-nearest neighbors (KNN) classifier from scratch in Python without using scikit-learn's KNN implementation. Given training features X_train with shape (n_samples, n_features), training labels y_train, a test set X_test, and an integer k, return the predicted class for each test example. In your explanation, address: - how distances are computed, using Euclidean distance by default - how the k nearest neighbors are selected - how majority voting works and how you would break ties deterministically - why feature scaling matters for KNN - the training-time and inference-time complexity - how you would choose k in practice and when weighted KNN may be preferable

Quick Answer: This question evaluates a candidate's understanding and practical implementation skills for instance-based supervised learning, specifically the k-nearest neighbors algorithm, covering distance metrics, neighbor selection, majority voting and tie-breaking, feature scaling effects, time complexity, and model selection considerations like k and weighting. It is commonly asked to assess both conceptual grasp of non-parametric classification and practical coding ability to implement and analyze algorithms, and it falls under coding & algorithms and machine learning domains with a mix of conceptual understanding and practical application.

Implement a k-nearest neighbors (KNN) classifier from scratch in Python without using scikit-learn's KNN implementation. You are given training features X_train, training labels y_train, a test set X_test, and an integer k. Return the predicted class label for each test example in the same order as X_test. Use Euclidean distance by default: for two feature vectors a and b, the distance is sqrt(sum((a_i - b_i)^2)). For each test example, compute its distance to every training example, select the k training points with the smallest distances, and predict the label by majority vote among those neighbors. To make the result deterministic, if multiple classes receive the same highest vote count, return the smallest class label among the tied classes. Important discussion points for this problem: - Feature scaling matters in KNN because Euclidean distance is sensitive to magnitude. A feature with a large numeric range can dominate the distance unless the data is standardized or normalized. - KNN has almost no real training phase beyond storing the dataset, so training time is minimal compared with many other models. However, inference is expensive because each test point must be compared to all training points. - In practice, k is usually chosen using a validation set or cross-validation. Small k can overfit noise, while large k can oversmooth class boundaries. - Weighted KNN may be preferable when closer neighbors should influence the prediction more than farther neighbors, even if both are inside the top k.

Constraints

  • 1 <= len(X_train) == len(y_train) <= 1000
  • 0 <= len(X_test) <= 1000
  • 1 <= len(X_train[0]) <= 50
  • 1 <= k <= len(X_train)
  • Feature values are integers or floats in the range [-10^4, 10^4]
  • All class labels are integers

Examples

Input: ([[1, 1], [2, 2], [3, 3], [6, 6]], [0, 0, 1, 1], [[2.1, 2.1], [5, 5]], 3)

Expected Output: [0, 1]

Explanation: For [2.1, 2.1], the 3 nearest labels are [0, 1, 0], so the prediction is 0. For [5, 5], the 3 nearest labels are [1, 1, 0], so the prediction is 1.

Input: ([[0, 0], [0, 2], [2, 0], [2, 2]], [0, 1, 1, 0], [[1, 1]], 4)

Expected Output: [0]

Explanation: All 4 training points are equally distant from [1, 1]. The vote counts are 0 -> 2 and 1 -> 2, so the deterministic tie-break returns the smaller label, 0.

Input: ([[1, 2], [1, 3], [4, 5], [5, 5], [5, 6]], [0, 0, 1, 1, 2], [[5, 5.2], [1, 2.4]], 3)

Expected Output: [1, 0]

Explanation: For [5, 5.2], the nearest labels are [1, 2, 1], so 1 wins. For [1, 2.4], the nearest labels are [0, 0, 1], so 0 wins.

Input: ([[0, 0], [1, 1]], [0, 1], [], 1)

Expected Output: []

Explanation: There are no test samples, so the result is an empty list.

Input: ([[3, 4]], [7], [[0, 0], [10, 10]], 1)

Expected Output: [7, 7]

Explanation: With only one training sample and k = 1, every test sample is classified as label 7.

Input: ([[-2, -1], [-1, -3], [2, 2], [3, 3]], [0, 0, 1, 1], [[-1.5, -2], [2.5, 2.5]], 3)

Expected Output: [0, 1]

Explanation: This checks negative coordinates. The first test point is closest to two class-0 points, while the second is closest to two class-1 points.

Hints

  1. For each test point, build a list of (distance, label) pairs by comparing it to every training point.
  2. After sorting by distance, only the first k labels matter. Use a frequency map to count votes, then apply the tie rule consistently.
Last updated: Apr 19, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Implement Integer Square Root - J.P. Morgan (medium)
  • Minimize Operations to Balance Shipments - J.P. Morgan (medium)
  • Solve scheduling and collision problems - J.P. Morgan (medium)
  • Design an autocomplete suggestions API - J.P. Morgan (easy)
  • Group strings that are anagrams - J.P. Morgan (easy)