vLLM Kunlun (vllm-kunlun) is a community-maintained hardware plugin designed to seamlessly run vLLM on the Kunlun XPU. It is the recommended approach for integrating the Kunlun backend within the vLLM community, adhering to the principles outlined in the RFC Hardware pluggable. This plugin provides a hardware-pluggable interface that decouples the integration of the Kunlun XPU with vLLM.
By utilizing the vLLM Kunlun plugin, popular open-source models, including Transformer-like, Mixture-of-Expert, Embedding, and Multi-modal LLMs, can run effortlessly on the Kunlun XPU.
Prerequisites
Hardware: Kunlun3 P800
OS: Ubuntu 20.04
Software:
Python >=3.10
PyTorch ≥ 2.5.1
vLLM (same version as vllm-kunlun)
Supported Models
Generaltive Models
Model
Support
Quantization
LoRA
Piecewise Kunlun Graph
Note
Qwen2
✅
✅
✅
Qwen2.5
✅
✅
✅
Qwen3
✅
✅
✅
Qwen3-Moe
✅
✅
✅
✅
Qwen3-Next
✅
✅
✅
✅
MiMo-V2-Flash
✅
✅
Llama2
✅
✅
Llama3
✅
✅
Llama3.1
✅
✅
gpt-oss
✅
DeepSeek-R1
✅
✅
✅
DeepSeek-V3
✅
✅
✅
DeepSeek-V3.2
✅
✅
✅
Kimi-K2
✅
✅
✅
Multimodal Language Models
Model
Support
Quantization
LoRA
Piecewise Kunlun Graph
Note
Qwen3-VL
✅
✅
Performance Visualization 🚀
High-performance computing at work: How different models perform on the Kunlun3 P800.
Current environment: 16-way concurrency, input/output size 2048.
Getting Started
Please use the following recommended versions to get started quickly:
If you’re interested in contributing to this project, please read Contributing to vLLM Kunlun.
Star History 🔥
We opened the project at Dec 8, 2025. We love open source and collaboration ❤️
Sponsors 👋
We sincerely appreciate the KunLunXin team for their support in providing XPU resources, which enabled efficient model adaptation debugging, comprehensive end-to-end testing, and broader model compatibility.
Documentation | Quick Start | Slack
Latest News 🔥
Overview
vLLM Kunlun (vllm-kunlun) is a community-maintained hardware plugin designed to seamlessly run vLLM on the Kunlun XPU. It is the recommended approach for integrating the Kunlun backend within the vLLM community, adhering to the principles outlined in the RFC Hardware pluggable. This plugin provides a hardware-pluggable interface that decouples the integration of the Kunlun XPU with vLLM.
By utilizing the vLLM Kunlun plugin, popular open-source models, including Transformer-like, Mixture-of-Expert, Embedding, and Multi-modal LLMs, can run effortlessly on the Kunlun XPU.
Prerequisites
Supported Models
Generaltive Models
Multimodal Language Models
Performance Visualization 🚀
High-performance computing at work: How different models perform on the Kunlun3 P800.
Current environment: 16-way concurrency, input/output size 2048.
Getting Started
Please use the following recommended versions to get started quickly:
Contribute to vLLM Kunlun
If you’re interested in contributing to this project, please read Contributing to vLLM Kunlun.
Star History 🔥
We opened the project at Dec 8, 2025. We love open source and collaboration ❤️
Sponsors 👋
We sincerely appreciate the KunLunXin team for their support in providing XPU resources, which enabled efficient model adaptation debugging, comprehensive end-to-end testing, and broader model compatibility.
License
Apache License 2.0, as found in the LICENSE file.