Explain GRPO-style training for diffusion models | Google Interview Question