冲
ReST-MCTS∗: LLM Self-Training via Process Reward Guided Tree Search
MCTS-DPO: Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning
©Copyright 2023 CCF 开源发展委员会 Powered by Trustie& IntelliDE 京ICP备13000930号
Awesome_MCTS_LLM
Agent
数学推理
ReST-MCTS∗: LLM Self-Training via Process Reward Guided Tree Search
MCTS-DPO: Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning
方法创新点汇总