Implement K-means and handle train-inference mismatch

Q: Implement K-means and handle train-inference mismatch

This is a Machine Learning interview question from Waymo for Data Scientist roles. View the full question and solution on PracHub.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Part A — K-means (implementation + concepts)

You are given a dataset $X \in \mathbb{R}^{n \times d}$ and an integer $k$ .

Explain K-means : what objective it optimizes and the alternating optimization procedure.
Implement K-means (Lloyd’s algorithm) :
- Initialize $k$ centroids.
- Repeat until convergence / max iterations:
  - Assign each point to its nearest centroid.
  - Recompute each centroid as the mean of points assigned to it.
- Return final centroids and assignments.
Improve initialization : describe and implement a better initialization strategy than random init (i.e., K-means++ ).

Clarify how you would handle:

Empty clusters
Stopping criteria
Time complexity

Part B — Multi-agent trajectory prediction (Waymo-like)

You are building a model to predict the next 2 timestamps of a target agent (e.g., another car near the ego vehicle). For each training example you have:

Past trajectory history for the target agent for $T$ steps: $(x_t, y_t)$ for $t=1..T$
Past trajectories for nearby agents (variable number $M$ )
Map / environment context (e.g., lane polylines, traffic signals), optionally rasterized or vectorized
Ground-truth future trajectory for the target agent for the next 2 steps

Questions

Modeling : Propose an ML approach to predict the next 2 positions. Specify:
- Input representation (agent features, relative coordinates, map encoding)
- Architecture (e.g., RNN/Transformer, GNN over agents, encoder-decoder)
- Output parameterization (deterministic points vs probabilistic distribution; multimodal vs unimodal)
- Loss function(s) and evaluation metrics (e.g., ADE/FDE, NLL)
Multi-head attention (MHA) : Explain what MHA is doing in this setting and why it helps.
Autoregressive training vs inference mismatch : Suppose you train an autoregressive decoder using teacher forcing / a causal mask (each step attends only to previous steps) where the model conditions on ground-truth previous positions during training. At inference, it must condition on its own previous predictions , causing compounding error.

How would you modify training and/or the learning objective to better match inference-time behavior? Discuss at least two approaches and their tradeoffs.

Implement K-means and handle train-inference mismatch

Part A — K-means (implementation + concepts)

Part B — Multi-agent trajectory prediction (Waymo-like)

Questions

Solution

Comments (0)