Try test.ipynb, which seamless support for Qwen2VL models.
✍️ Annotate your own data
Try recaption.ipynb, where we provide instructions on how to recaption the original annotations using GPT-4o.
❤ Acknowledgement
We extend our gratitude to SeeClick for providing their codes and datasets.
Special thanks to Siyuan for assistance with the Gradio demo and OOTB support.
🎓 BibTeX
If you find our work helpful, please kindly consider citing our paper.
@misc{lin2024showui,
title={ShowUI: One Vision-Language-Action Model for GUI Visual Agent},
author={Kevin Qinghong Lin and Linjie Li and Difei Gao and Zhengyuan Yang and Shiwei Wu and Zechen Bai and Weixian Lei and Lijuan Wang and Mike Zheng Shou},
year={2024},
eprint={2411.17465},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2411.17465},
}
If you like our project, please give us a star ⭐ on GitHub for the latest update.
ShowUI
Open-source, End-to-end, Lightweight, Vision-Language-Action model for GUI Agent & Computer Use.
ShowUI 是一款开源的、端到端、轻量级的视觉-语言-动作模型,专为 GUI 智能体设计。
   📑 Paper    | 🤗 Hugging Models   |    🤗 Spaces Demo    |    📝 Slides    |    🕹️ OpenBayes贝式计算 Demo
🤗 Datasets   |   💬 X (Twitter)   |    🖥️ Computer Use    |    📖 GUI Paper List    |    🤖 ModelScope
🔥 Update
python3 api.py
.ShowUI-web
dataset.showui
for UI-guided token selection implementation.ShowUI-desktop
.showlab/ShowUI-2B
is available at huggingface.🤖 vllm Inference
See inference_vllm.ipynb for vllm inference.
⚡ API Calling
Run
python3 api.py
by providing a screenshot and a query.🖥️ Computer Use
See Computer Use OOTB for using ShowUI to control your PC.
https://github.com/user-attachments/assets/f50b7611-2350-4712-af9e-3d31e30020ee
⭐ Quick Start
See Quick Start for local model usage.
🤗 Local Gradio
See Gradio for installation.
🚀 Training
Our Training codebases supports:
See Train for training set up.
🕹️ UI-Guided Token Selection
Try
test.ipynb
, which seamless support for Qwen2VL models.✍️ Annotate your own data
Try
recaption.ipynb
, where we provide instructions on how to recaption the original annotations using GPT-4o.❤ Acknowledgement
We extend our gratitude to SeeClick for providing their codes and datasets.
Special thanks to Siyuan for assistance with the Gradio demo and OOTB support.
🎓 BibTeX
If you find our work helpful, please kindly consider citing our paper.
If you like our project, please give us a star ⭐ on GitHub for the latest update.