RoBERTa (from Facebook): released with the paper RoBERTa: A Robustly Optimized BERT Pretraining Approach by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer and Veselin Stoyanov.
Chinese RoBERTa (from HFL): the Chinese version of RoBERTa.
Please refer to this readme for the usage of these models in EasyNLP.
Meanwhile, EasyNLP supports to load pretrained models from Huggingface/Transformers, please refer to this tutorial for details.
P-Tuning (from Tsinghua University, Beijing Academy of AI, MIT and Recurrent AI, Ltd.): released with the paper GPT Understands, Too by Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang and Jie Tang. We have made some slight modifications to make the algorithm suitable for the Chinese language.
This project is licensed under the Apache License (Version 2.0). This toolkit also contains some code modified from other repos under other open-source licenses. See the NOTICE file for more information.
修改日志
EasyNLP v0.0.3 was released in 01/04/2022. Please refer to tag_v0.0.3 for more details and history.
@article{easynlp,
doi = {10.48550/ARXIV.2205.00258},
url = {https://arxiv.org/abs/2205.00258},
author = {Wang, Chengyu and Qiu, Minghui and Zhang, Taolin and Liu, Tingting and Li, Lei and Wang, Jianing and Wang, Ming and Huang, Jun and Lin, Wei},
title = {EasyNLP: A Comprehensive and Easy-to-use Toolkit for Natural Language Processing},
publisher = {arXiv},
year = {2022}
}
EasyNLP is a Comprehensive and Easy-to-use NLP Toolkit
EasyNLP简介
随着 BERT、Megatron、GPT-3 等预训练模型在NLP领域取得瞩目的成果,越来越多团队投身到超大规模训练中,这使得训练模型的规模从亿级别发展到了千亿甚至万亿的规模。然而,这类超大规模的模型运用于实际场景中仍然有一些挑战。首先,模型参数量过大使得训练和推理速度过慢且部署成本极高;其次在很多实际场景中数据量不足的问题仍然制约着大模型在小样本场景中的应用,提高预训练模型在小样本场景的泛化性依然存在挑战。为了应对以上问题,PAI 团队推出了 EasyNLP 中文 NLP 算法框架,助力大模型快速且高效的落地。
主要特性
安装
环境要求:Python3.6, PyTorch >= 1.8.
快速上手
下面提供一个BERT文本分类的例子,只需要几行代码就可以训练BERT模型:
我们也提供了AppZoo的命令行来训练模型,只需要通过简单的参数配置就可以开启训练: 首先需要下载训练集train.tsv和测试集dev.tsv,然后开始训练:
模型预测命令如下:
AppZoo更多示例,详见:AppZoo文档.
ModelZoo
EasyNLP的ModelZoo目前支持如下预训练模型。
Please refer to this readme for the usage of these models in EasyNLP. Meanwhile, EasyNLP supports to load pretrained models from Huggingface/Transformers, please refer to this tutorial for details.
预训练大模型的落地
EasyNLP提供小样本学习和知识蒸馏,方便用户落地超大预训练模型。
CLUE Benchmark
EasyNLP提供 CLUE评测代码,方便用户快速评测CLUE数据上的模型效果。
根据我们的脚本,可以获得BERT,RoBERTa等模型的评测效果(dev数据):
(1) bert-base-chinese
(2) chinese-roberta-wwm-ext:
详细的例子,请参考CLUE评测示例.
Tutorials
License
This project is licensed under the Apache License (Version 2.0). This toolkit also contains some code modified from other repos under other open-source licenses. See the NOTICE file for more information.
修改日志
联系我们
扫描下面二维码加入dingding群,有任何问题欢迎在群里反馈。
参考文献
更加详细的解读可以参考我们的 arxiv 文章。