Implement correct attention masking
Company: Applied Intuition
Role: Machine Learning Engineer
Category: Machine Learning
Difficulty: medium
Interview Round: Technical Screen
Quick Answer: This question evaluates a candidate's competency in attention mechanisms for autoregressive Transformers, specifically the implementation of causal and padding masks, their combination in multi-head attention, and the ability to diagnose masking-related training anomalies.