Implement multi-head self-attention correctly | Apple