Implement K-means and handle train-inference mismatch
Company: Waymo
Role: Data Scientist
Category: Machine Learning
Difficulty: easy
Interview Round: Technical Screen
## Part A — K-means (implementation + concepts)
You are given a dataset \(X \in \mathbb{R}^{n \times d}\) and an integer \(k\).
1. **Explain K-means**: what objective it optimizes and the alternating optimization procedure.
2. **Implement K-means (Lloyd’s algorithm)**:
- Initialize \(k\) centroids.
- Repeat until convergence / max iterations:
- Assign each point to its nearest centroid.
- Recompute each centroid as the mean of points assigned to it.
- Return final centroids and assignments.
3. **Improve initialization**: describe and implement a better initialization strategy than random init (i.e., **K-means++**).
Clarify how you would handle:
- Empty clusters
- Stopping criteria
- Time complexity
---
## Part B — Multi-agent trajectory prediction (Waymo-like)
You are building a model to predict the **next 2 timestamps** of a **target agent** (e.g., another car near the ego vehicle). For each training example you have:
- Past trajectory history for the target agent for \(T\) steps: \((x_t, y_t)\) for \(t=1..T\)
- Past trajectories for nearby agents (variable number \(M\))
- Map / environment context (e.g., lane polylines, traffic signals), optionally rasterized or vectorized
- Ground-truth future trajectory for the target agent for the next 2 steps
### Questions
1. **Modeling**: Propose an ML approach to predict the next 2 positions. Specify:
- Input representation (agent features, relative coordinates, map encoding)
- Architecture (e.g., RNN/Transformer, GNN over agents, encoder-decoder)
- Output parameterization (deterministic points vs probabilistic distribution; multimodal vs unimodal)
- Loss function(s) and evaluation metrics (e.g., ADE/FDE, NLL)
2. **Multi-head attention (MHA)**: Explain what MHA is doing in this setting and why it helps.
3. **Autoregressive training vs inference mismatch**: Suppose you train an autoregressive decoder using **teacher forcing** / a **causal mask** (each step attends only to previous steps) where the model conditions on ground-truth previous positions during training. At inference, it must condition on its **own previous predictions**, causing compounding error.
How would you modify training and/or the learning objective to better match inference-time behavior? Discuss at least two approaches and their tradeoffs.
Quick Answer: This question evaluates understanding and hands-on competency in unsupervised clustering (K-means objective, alternating optimization, initialization strategies, empty-cluster handling, stopping criteria, and computational complexity) and sequence modeling for multi-agent trajectory prediction (input representation, attention/GNN architectures, deterministic vs probabilistic outputs, evaluation metrics, and exposure bias from autoregressive training). It is commonly asked to probe both conceptual understanding and practical implementation skills in the Machine Learning domain, testing optimization and scalability trade-offs as well as model-design and training/inference mismatch considerations at a mix of conceptual and practical application abstraction levels.