Implement attention and nucleus sampling; compare to top-k | TikTok