Blog de Simon🫣
历史归档
文章分类
文章标签
Github
About Me
友链
开往
文章
63
分类
4
标签
36
历史归档
文章分类
文章标签
Github
About Me
友链
开往
目录
0%
一、Proximal Policy Optimization (PPO) Loss
1.1 策略损失 (Actor Loss)
1.2 价值损失 (Critic Loss)
1.3 熵损失 (Entropy Loss/Bonus)
二、Generative Reward Policy Optimization (GRPO) Loss
三、Group Sequence Policy Optimization (GSPO) Loss
四、REINFORCE Leave-One-Out (RLOO) Loss
五、REINFORCE++ Loss