Official repo for Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models.
We propose training a Q-value model to guide action selection for LLM agents in each decision-making step.
Our method comprises both training and inference stages. During the training stage, we first use Monte Carlo Tree Search (MCTS) to explore high-quality trajectories, annotating the actions in each step with Q-values. We then construct preference data and train the Q-value model using step-level Direct Policy Optimization (DPO). During inference, the trained Q-value model guides action selection at each decision-making step.
Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models
🤗 Dataset
Official repo for Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models.
We propose training a Q-value model to guide action selection for LLM agents in each decision-making step. Our method comprises both training and inference stages. During the training stage, we first use Monte Carlo Tree Search (MCTS) to explore high-quality trajectories, annotating the actions in each step with Q-values. We then construct preference data and train the Q-value model using step-level Direct Policy Optimization (DPO). During inference, the trained Q-value model guides action selection at each decision-making step.
Our method has following features: | Approach | Step Level | Applicable to API-based LLMs | Single Trial | Task Experience Accumulation | |———————————————–|:———-:|:—————————-:|:————:|:—————————-:| | Prompt Strategies: Reflection, Reflexion | ❌ | ✔ | ✔ or ❌ | ❌ | | Tree Search: LATS, Search-agent | ✔ | ✔ | ❌ | ❌ | | Fine-tuning: Agent-FLAN, AgentEvol, ETO | ❌ | ❌ | ✔ | ✔ | | Q-value model enhanced (Ours) | ✔ | ✔ | ✔ | ✔ |
🛠️ Environments Setup
WebShop
Move to the WebShop directory:
Install WebShop from source and run environment instance locally. Follow the instructions here (https://github.com/princeton-nlp/WebShop)
Install the module dependencies into your environment:
HotPotQA
Move to the HotPotQA directory:
Install the module dependencies into your environment:
🎎Multi-type Agents Support
API-based LLM agents
Set
OPENAI_API_KEY
environment variable to your OpenAI API key:Open-source LLM agents
For open-source LLM agents, we adopt the OpenAI-compatible APIs provided by FastChat.
Move to the fastchat directory:
Launch the controller of FastChat
Launch the model worker of FastChat ```bash bash start_multiple_vllm_server_from0_Phi3.sh bash start_multiple_vllm_server_from0_Llama31.sh `
🚀 Training
We use the HotPotQA task as an example, which can be directly transferred to the webshop task.
🎮 Inference
Finally, evaluate the agent
--algorithm
: “simple” refers to Greedy Decision-making, while “beam” refers to Guiding Action Selection with Q-value model.