“2026-04-13”版本间的差异
来自cslt Wiki
Lishuailong(讨论 | 贡献) |
|||
| 第107行: | 第107行: | ||
|Yu Zhang | |Yu Zhang | ||
|| | || | ||
| − | * | + | * GPU Util: [https://z1et6d3xtb.feishu.cn/wiki/XX4NwX3tJiBDcgkMi0hcFUtInHh] |
| + | * Chain level experiments: | ||
| + | ** After introducing the Metric Reward, the weights of correct edges converge faster compared to training with pure reinforcement learning alone. | ||
| + | ** The worse the situation when the Metric Reward is introduced (i.e., the lower the weights of critical edges), the more significant the difference compared to not using the Metric Reward. | ||
|| | || | ||
* | * | ||
| 第118行: | 第121行: | ||
|Junhui Chen | |Junhui Chen | ||
|| | || | ||
| − | * | + | * To strengthen the robustness of the conclusions, conducting additional experiments: |
| + | ** Introduce a new baseline (AgentPrune). | ||
| + | ** Add experiments on a new dataset (GSM8K). | ||
| + | ** Reproduce the results on other LLM base models. | ||
| + | * Paper writing | ||
|| | || | ||
* | * | ||
2026年4月13日 (一) 10:32的版本
| People | This Week | Next Week | Task Tracking (DeadLine) |
|---|---|---|---|
| Dong Wang |
|
|
|
| Lantian Li |
|
|
|
| Wenqiang Du |
|
|
|
| Yang Wei |
|
|
|
| Ying Shi |
|
|
|
| Yue Gu |
|
|
|
| Lily |
|
|
|
| Pengqi Li |
|
|
|
| Junming Yuan |
|
|
|
| Yu Zhang |
|
|
|
| Junhui Chen |
|
|
|
| Jiaying Wang |
|
|
|
| Bochao Hu |
|
|
|
| Hongcheng Zhang |
|
|
|
| Weiman Sun |
|
|
|
| Ge Gao |
|
|
|
| Shuailong Li |
|
|
|