Implement local maxima, bagging, and k-means
Company: TikTok
Role: Machine Learning Engineer
Category: Coding & Algorithms
Difficulty: hard
Interview Round: Take-home Project
Solve the following programming tasks.
## 1) Find all “local maxima” in a streaming temperature array
You are cleaning sensor data for temperature-fluctuation anomaly modeling.
You are given:
- An integer array `rawData` of length `n`.
- An integer `localArea >= 0`.
Index `i` is a **local maximum** if:
- Looking left from `i`, the sequence `rawData[i], rawData[i-1], rawData[i-2], ...` is **strictly decreasing** for `L = min(localArea, i)` steps, i.e.
- `rawData[i] > rawData[i-1] > rawData[i-2] > ... > rawData[i-L]`.
- Looking right from `i`, the sequence `rawData[i], rawData[i+1], rawData[i+2], ...` is **strictly decreasing** for `R = min(localArea, n-1-i)` steps, i.e.
- `rawData[i] > rawData[i+1] > rawData[i+2] > ... > rawData[i+R]`.
If there are fewer than `localArea` elements to the left or right, use all available elements on that side.
**Task:** Return all indices `i` (0-based) that are local maxima, in increasing order.
**Constraints (typical for OA):** Aim for better than \(O(n\cdot localArea)\) when `n` is large.
---
## 2) Implement Bagging classification from scratch (code skeleton)
Implement missing functions in a provided code skeleton (you may not change existing code besides filling in the ellipses).
You are given:
- `x_train`: 2D array of floats with shape `(n_train, d)`.
- `y_train`: 1D array of integer class labels with shape `(n_train,)`.
- `x_test`: 2D array of floats with shape `(n_test, d)`.
- `base_classifiers`: an array/list of length `n_estimators`, where each classifier supports:
- `fit(X, y)`
- `predict(X)` returning integer labels.
**Task:** Build a Bagging classifier that:
1. Creates a **bootstrap sample** (sample `n_train` rows *with replacement*) for each base classifier.
2. Fits each base classifier on its bootstrap sample.
3. Predicts labels for `x_test` by **majority vote** across estimators.
**Output:** Return a 1D array of predicted integer labels of length `n_test`.
(If there is a voting tie, use the smallest class label as the tie-breaker unless the skeleton specifies otherwise.)
---
## 3) Implement k-means clustering from scratch (code skeleton)
Implement missing functions in a provided k-means skeleton.
You are given:
- `data`: 2D array of floats with shape `(n, d)`.
- `k`: number of clusters.
- `centroids`: initial centroids, 2D array of floats with shape `(k, d)`.
- `iterations`: number of update steps to perform.
k-means loop:
1. Compute distances from each point to each centroid (typically Euclidean).
2. Assign each point to the nearest centroid.
3. Update each centroid to the mean of points assigned to it.
4. Repeat for `iterations` iterations.
**Output:** Return a 1D integer array of length `n` with the predicted cluster index for each point.
(If a cluster gets zero points in an iteration, keep its centroid unchanged unless the skeleton specifies another rule.)
Quick Answer: This question evaluates skills in algorithm implementation and machine learning fundamentals—specifically streaming local maxima detection, bootstrap aggregation (bagging) for classification, and k-means clustering—testing competencies in algorithm design, data structures, sampling, and numerical computation within the Coding & Algorithms domain for a Machine Learning Engineer role. These tasks are commonly asked to assess practical implementation ability, algorithmic efficiency, and understanding of ensemble and clustering behavior in real-world data processing, operating primarily at the practical application level rather than as a purely conceptual exercise.