冲
ReST-MCTS∗: LLM Self-Training via Process Reward Guided Tree Search
MCTS-DPO: Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning
版权所有:中国计算机学会技术支持:开源发展技术委员会 京ICP备13000930号-9 京公网安备 11010802032778号
Awesome_MCTS_LLM
Agent
数学推理
ReST-MCTS∗: LLM Self-Training via Process Reward Guided Tree Search
MCTS-DPO: Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning
方法创新点汇总