Implement multi-head attention and LLM sampling | Scale AI Interview Question