PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Machine Learning/Waymo

Implement K-means and handle train-inference mismatch

Last updated: Mar 29, 2026

Quick Overview

This question evaluates understanding and hands-on competency in unsupervised clustering (K-means objective, alternating optimization, initialization strategies, empty-cluster handling, stopping criteria, and computational complexity) and sequence modeling for multi-agent trajectory prediction (input representation, attention/GNN architectures, deterministic vs probabilistic outputs, evaluation metrics, and exposure bias from autoregressive training). It is commonly asked to probe both conceptual understanding and practical implementation skills in the Machine Learning domain, testing optimization and scalability trade-offs as well as model-design and training/inference mismatch considerations at a mix of conceptual and practical application abstraction levels.

  • easy
  • Waymo
  • Machine Learning
  • Data Scientist

Implement K-means and handle train-inference mismatch

Company: Waymo

Role: Data Scientist

Category: Machine Learning

Difficulty: easy

Interview Round: Technical Screen

## Part A — K-means (implementation + concepts) You are given a dataset \(X \in \mathbb{R}^{n \times d}\) and an integer \(k\). 1. **Explain K-means**: what objective it optimizes and the alternating optimization procedure. 2. **Implement K-means (Lloyd’s algorithm)**: - Initialize \(k\) centroids. - Repeat until convergence / max iterations: - Assign each point to its nearest centroid. - Recompute each centroid as the mean of points assigned to it. - Return final centroids and assignments. 3. **Improve initialization**: describe and implement a better initialization strategy than random init (i.e., **K-means++**). Clarify how you would handle: - Empty clusters - Stopping criteria - Time complexity --- ## Part B — Multi-agent trajectory prediction (Waymo-like) You are building a model to predict the **next 2 timestamps** of a **target agent** (e.g., another car near the ego vehicle). For each training example you have: - Past trajectory history for the target agent for \(T\) steps: \((x_t, y_t)\) for \(t=1..T\) - Past trajectories for nearby agents (variable number \(M\)) - Map / environment context (e.g., lane polylines, traffic signals), optionally rasterized or vectorized - Ground-truth future trajectory for the target agent for the next 2 steps ### Questions 1. **Modeling**: Propose an ML approach to predict the next 2 positions. Specify: - Input representation (agent features, relative coordinates, map encoding) - Architecture (e.g., RNN/Transformer, GNN over agents, encoder-decoder) - Output parameterization (deterministic points vs probabilistic distribution; multimodal vs unimodal) - Loss function(s) and evaluation metrics (e.g., ADE/FDE, NLL) 2. **Multi-head attention (MHA)**: Explain what MHA is doing in this setting and why it helps. 3. **Autoregressive training vs inference mismatch**: Suppose you train an autoregressive decoder using **teacher forcing** / a **causal mask** (each step attends only to previous steps) where the model conditions on ground-truth previous positions during training. At inference, it must condition on its **own previous predictions**, causing compounding error. How would you modify training and/or the learning objective to better match inference-time behavior? Discuss at least two approaches and their tradeoffs.

Quick Answer: This question evaluates understanding and hands-on competency in unsupervised clustering (K-means objective, alternating optimization, initialization strategies, empty-cluster handling, stopping criteria, and computational complexity) and sequence modeling for multi-agent trajectory prediction (input representation, attention/GNN architectures, deterministic vs probabilistic outputs, evaluation metrics, and exposure bias from autoregressive training). It is commonly asked to probe both conceptual understanding and practical implementation skills in the Machine Learning domain, testing optimization and scalability trade-offs as well as model-design and training/inference mismatch considerations at a mix of conceptual and practical application abstraction levels.

Related Interview Questions

  • Design an Online Experiment - Waymo (medium)
  • How predict vehicles’ turn direction at intersection? - Waymo (easy)
  • Compare two rare-event detection models statistically - Waymo (easy)
Waymo logo
Waymo
Dec 6, 2025, 12:00 AM
Data Scientist
Technical Screen
Machine Learning
11
0

Part A — K-means (implementation + concepts)

You are given a dataset X∈Rn×dX \in \mathbb{R}^{n \times d}X∈Rn×d and an integer kkk.

  1. Explain K-means : what objective it optimizes and the alternating optimization procedure.
  2. Implement K-means (Lloyd’s algorithm) :
    • Initialize kkk centroids.
    • Repeat until convergence / max iterations:
      • Assign each point to its nearest centroid.
      • Recompute each centroid as the mean of points assigned to it.
    • Return final centroids and assignments.
  3. Improve initialization : describe and implement a better initialization strategy than random init (i.e., K-means++ ).

Clarify how you would handle:

  • Empty clusters
  • Stopping criteria
  • Time complexity

Part B — Multi-agent trajectory prediction (Waymo-like)

You are building a model to predict the next 2 timestamps of a target agent (e.g., another car near the ego vehicle). For each training example you have:

  • Past trajectory history for the target agent for TTT steps: (xt,yt)(x_t, y_t)(xt​,yt​) for t=1..Tt=1..Tt=1..T
  • Past trajectories for nearby agents (variable number MMM )
  • Map / environment context (e.g., lane polylines, traffic signals), optionally rasterized or vectorized
  • Ground-truth future trajectory for the target agent for the next 2 steps

Questions

  1. Modeling : Propose an ML approach to predict the next 2 positions. Specify:
    • Input representation (agent features, relative coordinates, map encoding)
    • Architecture (e.g., RNN/Transformer, GNN over agents, encoder-decoder)
    • Output parameterization (deterministic points vs probabilistic distribution; multimodal vs unimodal)
    • Loss function(s) and evaluation metrics (e.g., ADE/FDE, NLL)
  2. Multi-head attention (MHA) : Explain what MHA is doing in this setting and why it helps.
  3. Autoregressive training vs inference mismatch : Suppose you train an autoregressive decoder using teacher forcing / a causal mask (each step attends only to previous steps) where the model conditions on ground-truth previous positions during training. At inference, it must condition on its own previous predictions , causing compounding error.

How would you modify training and/or the learning objective to better match inference-time behavior? Discuss at least two approaches and their tradeoffs.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Waymo•More Data Scientist•Waymo Data Scientist•Waymo Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.