Blog de Simon🫣
历史归档
文章分类
文章标签
Github
About Me
友链
开往
SimonSun
文章
63
分类
4
标签
36
历史归档
文章分类
文章标签
Github
About Me
友链
开往
目录
0%
一、Proximal Policy Optimization (PPO) Loss1.1 策略损失 (Actor Loss) 1.2 价值损失 (Critic Loss) 1.3 熵损失 (Entropy Loss/Bonus) 二、Generative Reward Policy Optimization (GRPO) Loss三、Group Sequence Policy Optimization (GSPO) Loss四、REINFORCE Leave-One-Out (RLOO) Loss五、REINFORCE++ Loss
2023-2026SimonSun.

Blog de Simon🫣 | Internet Malou, LLM Rookie, Bug Maker🤧

Powered byNotionNext 4.9.5.2.