目录

天功/GongBU logo

🚀 GongBU is a comprehensive, all-in-one, no-code, local finetuning platform for large language models. Once deployed, there is no need to write any code to finetune a model, and the platform can be used by anyone with a web browser. The platform is built on top of the Transformers, Peft, etc. and is designed to be easy to use for non-technical users.

📈 You can easily finetune, evaluate, and deploy models with the platform.

📦 Currently, by default, the platform should be started with docker-compose in a linux system with nvidia-docker. However, by installing the required dependencies, with correct configurations, the platform can be run natively.

The recommended installation is to use the docker-compose file provided in the repository on a linux system with nvidia-docker. This is the easiest way to install the platform.

It’s simple, clone the repository, then run the download.py (inquirer and rich libraries are required) to download necessary files that are not included in the repository, micromamba and a bert model to be exact.

You need to install docker and nvidia-docker.

Then run the following command:

docker compose -f docker-compose.prod.yaml up

The platform should be running on localhost:5173 behind an nginx reverse proxy. Simply use localhost:5173/home to access the platform.

🔧 Usage

After accessing the platform, you will be welcomed by the log in page. By default, the platform does not provide any user, so you need to create a user by clicking the sign up button.

When signing up, by default, you will need a token to sign up. It will be in the sign_up_token.txt file in the root directory of the project. You can change the token in the sign_up_token.txt however you want. Nevertheless, you can use a permissive registering strategy by setting the environment variable NO_SIGNUP_TOKEN in the docker compose file, which will allow anyone to sign up.

After logging in, you will find everything to be empty, here’s how you can import assets into the platform, only two kinds of assets are needed, the dataset and the model.

  • To upload a dataset, you need to go to the data page. The datasets are arranged in pools, with each pools containing several datasets. You can create a new pool by clicking the create button, and follow the instructions to upload a dataset. You can also click the details button of an existing pool to upload a dataset to that pool.

  • To download models, you need to go to the model download page.

    Model_DownloadPage

    Type in the relavent information in the form, and click the add button, there will be a list down below showing the models that you have added.

After you have uploaded the datasets and downloaded the models, you can fintune, evaluate and deploy the models on respective pages.

✅ Current Status

The platform is now fully functional, albeit under constant development and polishing, especially the frontend, and breaking changes may take place.

🧾 Trivial

  • We used bun instead of node, which is a blazingly-fast, all-in-one js bundler, runtime, test runner, package manager, etc. It is a relatively new project, but its performance attracted us to use it. Currently, it is stable for our use. However, if you want to switch to node for better stability, simply change all the bun-related command in the dockerfile to node or npm.

⚠️ Disclaimer

FOR ANY LOSS OR DAMAGE, THE DEVELOPER IS NOT RESPONSIBLE, AS IS STATED IN THE MIT LICENSE.

🔒 Security Issues

This project uses plain text to transfer data between the client and the server, which includes the username, password, and dataset. The password is hashed before being stored in the database, but the dataset is stored in plain text.

If you need to use the platform in open networks, please secure the database and secure the connections with SSL manually, this is not what we can provide (by changing the nginx config under the proxy folder, all requests in production goes through nginx). However, so long as SSL is enabled and the database is secured, equipped with other measures that are implemented by us, the platform should be secure.

Citation

@inproceedings{zhang2024gongbu,
  title={GongBu: Easily Fine-tuning LLMs for Domain-specific Adaptation},
  author={Zhang, Bolin and Tian, Yimin and Wang, Shengwei and Tu, Zhiying and Chu, Dianhui and Shen, Zhiqi},
  booktitle={Proceedings of the 33rd ACM International Conference on Information and Knowledge Management},
  pages={5309--5313},
  year={2024}
}
关于

一站式大模型研发平台,集成模型获取、定制化训练、性能评估、高效部署及全生命周期数据管理能力。天功平台旨在降低大模型应用门槛,支持用户通过可视化界面完成从模型选择到应用落地的全流程操作,提升大模型研发效率与模型应用便捷性。

9.5 MB
邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9 京公网安备 11010802047560号