Part A — K-means (implementation + concepts)
You are given a dataset X∈Rn×d and an integer k.
-
Explain K-means
: what objective it optimizes and the alternating optimization procedure.
-
Implement K-means (Lloyd’s algorithm)
:
-
Initialize
k
centroids.
-
Repeat until convergence / max iterations:
-
Assign each point to its nearest centroid.
-
Recompute each centroid as the mean of points assigned to it.
-
Return final centroids and assignments.
-
Improve initialization
: describe and implement a better initialization strategy than random init (i.e.,
K-means++
).
Clarify how you would handle:
-
Empty clusters
-
Stopping criteria
-
Time complexity
Part B — Multi-agent trajectory prediction (Waymo-like)
You are building a model to predict the next 2 timestamps of a target agent (e.g., another car near the ego vehicle). For each training example you have:
-
Past trajectory history for the target agent for
T
steps:
(xt,yt)
for
t=1..T
-
Past trajectories for nearby agents (variable number
M
)
-
Map / environment context (e.g., lane polylines, traffic signals), optionally rasterized or vectorized
-
Ground-truth future trajectory for the target agent for the next 2 steps
Questions
-
Modeling
: Propose an ML approach to predict the next 2 positions. Specify:
-
Input representation (agent features, relative coordinates, map encoding)
-
Architecture (e.g., RNN/Transformer, GNN over agents, encoder-decoder)
-
Output parameterization (deterministic points vs probabilistic distribution; multimodal vs unimodal)
-
Loss function(s) and evaluation metrics (e.g., ADE/FDE, NLL)
-
Multi-head attention (MHA)
: Explain what MHA is doing in this setting and why it helps.
-
Autoregressive training vs inference mismatch
: Suppose you train an autoregressive decoder using
teacher forcing
/ a
causal mask
(each step attends only to previous steps) where the model conditions on ground-truth previous positions during training. At inference, it must condition on its
own previous predictions
, causing compounding error.
How would you modify training and/or the learning objective to better match inference-time behavior? Discuss at least two approaches and their tradeoffs.