Update README.md
format: - [title](paper link) [links] - 论文公布时间: - 投/录用会议: - 单位: - 贡献: - 任务: - code:
RAP: Reasoning with Language Model is Planning with World Model
论文公布时间:2023年8月
投/录用会议: EMNLP 2023 录用
单位:加州大学圣迭戈分校,佛罗里达大学
贡献:大模型作为世界模型,给出状态转移用于MCTS的Rollout simulation
任务:Blocksworld,GSM8k,PrOntoQA
Code: Official
ReST-MCTS∗: LLM Self-Training via Process Reward Guided Tree Search
MCTS-DPO: Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning
AlphaMath Almost Zero: Process Supervision without Process
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning
ALPHALLM-CPL: Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning
©Copyright 2023 CCF 开源发展委员会 Powered by Trustie& IntelliDE 京ICP备13000930号
Awesome_MCTS_LLM
数学推理
RAP: Reasoning with Language Model is Planning with World Model
论文公布时间:2023年8月
投/录用会议: EMNLP 2023 录用
单位:加州大学圣迭戈分校,佛罗里达大学
贡献:大模型作为世界模型,给出状态转移用于MCTS的Rollout simulation
任务:Blocksworld,GSM8k,PrOntoQA
Code: Official
ReST-MCTS∗: LLM Self-Training via Process Reward Guided Tree Search
MCTS-DPO: Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning
AlphaMath Almost Zero: Process Supervision without Process
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning
ALPHALLM-CPL: Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning
Agent
方法梗概汇总(时间顺序)
可关注的点
TODO list ⏳✅