Implement multi-head attention and LLM sampling | Scale AI Coding Question