Use generated results to eval Match Rate and Pass Rate
bash scripts/category/eval/eval_match_pass_rate.sh api name2 <output_path>
Example
bash scripts/category/eval/eval_match_pass_rate.sh api name2 data/category/inference/plan_1107_G3_gensample_RRHF_Cate_1122_level_23
bash scripts/category/eval/eval_match_pass_rate.sh api name2 data/category/inference/plan_1107_G3_gensample_RRHF_Tool_1122_level_23
bash scripts/category/eval/eval_match_pass_rate.sh api name2 data/category/inference/plan_1107_G3_gensample_RRHF_API_1122_level_23
bash scripts/category/eval/eval_match_pass_rate.sh api name2 data/category/inference/plan_1107_G3_gensample_RRHF_Desc_1122_level_23
Script
Use generated results to eval Win Rate
Change generate(prompt, name) function in "ToolPlanner/toolbench/tooleval/new_eval_win_rate_cut_list.py" to your own ChatGPT API.
bash scripts/category/eval/eval_match_pass_rate.sh api name2 <output_path>
@misc{wu2024toolplannertoolaugmentedllm,
title={ToolPlanner: A Tool Augmented LLM for Multi Granularity Instructions with Path Planning and Feedback},
author={Qinzhuo Wu and Wei Liu and Jian Luan and Bin Wang},
year={2024},
eprint={2409.14826},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2409.14826},
}
The source code of the this is licensed under the Apache 2.0 license.
Summary of Terms
Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made.
NonCommercial: You may not use the material for commercial purposes.
ShareAlike: If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
License Badge
5. Citation
If you’d like to use our benchmark or cite this paper, please kindly use the reference below:
@inproceedings{wu2024toolplanner,
title={ToolPlanner: A Tool Augmented LLM for Multi Granularity Instructions with Path Planning and Feedback},
author={Wu, Qinzhuo and Liu, Wei and Luan, Jian and Wang, Bin},
booktitle={Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing},
pages={18315--18339},
year={2024}
}
ToolPlanner
Paper Link
ToolPlanner: A Tool Augmented LLM for Multi Granularity Instructions with Path Planning and Feedback
目录
Requirement
Data
Download Data and Checkpoints
download these data and unzip them. |path|data description|data name|url| |—-|—–|—–|—–| |[/data/category/answer]|MGToolBench: sft training dataset|G3_plan_gen_train_1020_G3_3tag_whole_prefixTagTraceAll.json|https://huggingface.co/datasets/wuqinzhuo/ToolPlanner| |[/data/category/dataset]|MGToolBench: pairwise_responses|G3_1107_gensample_Reward_pair.json|https://huggingface.co/datasets/wuqinzhuo/ToolPlanner| |[/data/category/toolenv]|Tool Environment: Tools, APIs, and their documentation.|toolenv.zip|https://huggingface.co/datasets/wuqinzhuo/ToolPlanner| |[/data/category/inference]|Output: solution trees path|inference.zip|https://huggingface.co/datasets/wuqinzhuo/ToolPlanner| |[/data/retrieval/G3_clear]|Training dataset for Retrivel model|train.json|https://huggingface.co/datasets/wuqinzhuo/ToolPlanner| |[/data/retrieval/G3_clear]|Training dataset for Retrivel mode|corpus.tsv|https://huggingface.co/datasets/wuqinzhuo/ToolPlanner|
Model
Install
Train ToolPlanner, Stage 1 SFT
Script
Code
Train ToolPlanner, Stage 2 Reinforcement Learning
Script
Code
Inference, Generate Solution Tree
Script
ToolBench Key
Go to ToolBench to apply for a ToolBench Key.
Decode_Method
Full ModelMix_Whole3Tag_MixWhole3TagTrace_3TagRepla_PureRepla_MixWhole3Retri_MixWhole3TagTraceGen_DFS_woFilter_w2Seperate RetrieverMix_Whole3Tag_MixWhole3TagTrace_MixWhole3Retri_MixWhole3TagTraceGen_DFS_woFilter_w2Without Solution PlanningMix_Whole3Tag_MixWhole3TagTrace_MixWhole3Retri_MixWhole3Gen_DFS_woFilter_w2Without Tag ExtractionMix_Whole3Tag_MixWhole3TagTrace_MixTagTraceRetri_MixTagTraceGen_DFS_woFilter_w2Without Tag & SolutionMix_Whole3Tag_MixWhole3TagTrace_MixRetri_MixGen_DFS_woFilter_w2Chain-based MethodMix_Whole3Tag_MixWhole3TagTrace_3TagRepla_PureRepla_MixWhole3Retri_MixWhole3TagTraceGen_CoT@5Example
Eval
Script
Use generated results to eval Match Rate and Pass Rate
Example
Script
Use generated results to eval Win Rate
Example
Citation
License
The dataset of this project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.
The source code of the this is licensed under the Apache 2.0 license.
Summary of Terms
License Badge
5. Citation
If you’d like to use our benchmark or cite this paper, please kindly use the reference below: