目录

vLLM

vLLM MetaX Plugin

DeepWiki

| About MetaX | Documentation | #sig-maca |


Latest News 🔥

  • [2026/3] Released vllm-metax v0.15.0 🦐 — aligned with vLLM v0.15.0, more models and more features!
  • [2026/3] Released vllm-metax v0.14.0 🚀 — aligned with vLLM v0.14.0, same as usual.
  • [2026/2] Released vllm-metax v0.13.0 🧨 — aligned with vLLM v0.13.0, brings you the latest features and model in v0.13.0!
  • [2026/1] Released vllm-metax v0.12.0 😎 — aligned with vLLM v0.12.0, supported more models and improved performance.
  • [2026/1] Released vllm-metax v0.11.2 👻 — aligned with vLLM v0.11.2, supported more models and improved performance.
  • [2025/11] Released vllm-metax v0.10.2 🎉 — aligned with vLLM v0.10.2, improved model performance, and fixed key decoding bugs.
  • [2025/11] We hosted vLLM Beijing Meetup focusing on distributed inference and diverse accelerator support with vLLM! Please find the meetup slides here.
  • [2025/08] We hosted vLLM Shanghai Meetup focusing on building, developing, and integrating with vLLM! Please find the meetup slides here.

About

vLLM MetaX is a hardware plugin that enables vLLM to run seamlessly on MetaX GPUs. MetaX provides a cuda-like backend through MACA, delivering a near-native CUDA experience on MetaX hardware.

It is the recommended approach for supporting the MetaX backend within the vLLM community.

The plugin is implemented in accordance with the vLLM plugin RFCs:

These RFCs help ensure proper feature and functionality support when integrating MetaX GPUs with vLLM.

Prerequisites

  • Hardware: MetaX C-series
  • OS: Linux
  • Software:
    • Python >= 3.10, <= 3.12
    • vLLM (the same version as vllm-metax)
    • Docker support

Getting Started

vLLM MetaX currently supports deployment only with Docker images released by the MetaX developer community, which work out of the box.

The Dockerfile for other OS environments is still under testing.

If you want to develop, debug, or test the latest features in vllm-metax, you may need to build it from source. Please follow this source build tutorial.

Branch

vllm-metax has three kinds of branches.

  • master: the main branch, which tracks the upstream vLLM main branch.
  • vX.Y.Z-dev: development branches created after a vLLM release.

    For example, v0.1x.0-dev is the development branch for a newly released branch like releases/v0.1x.0.

  • releases/vX.Y.Z: release branches created from v0.1x.0-dev, indicating that the corresponding vllm-metax development branch has been fully tested and released.

    For example, vllm-metax’s releases/v0.1x.0 corresponds to vLLM’s releases/v0.1x.0. The same naming rule applies to tags.

Below are the maintained branches:

Branch Status Note
master N/A Tracks vLLM main; functionality is not guaranteed
v0.19.0-dev N/A WIP
v0.18.0-dev N/A WIP
v0.17.0-dev N/A Planned for release in April
v0.16.0 N/A Skipped
releases/v0.15.0 Released Corresponds to vLLM release v0.15.0
releases/v0.14.0 Released Corresponds to vLLM release v0.14.0
releases/v0.13.0 Released Corresponds to vLLM release v0.13.0
releases/v0.12.0 Released Corresponds to vLLM release v0.12.0
releases/v0.11.2 Released Corresponds to vLLM release v0.11.2
releases/v0.10.2 Released Corresponds to vLLM release v0.10.2

For more details, please check the Quickstart Guide.

License

Apache License 2.0, as found in the LICENSE file.

关于

vLLM-MetaX GPU适配插件,是为vLLM推理框架打造的硬件适配插件。该插件基于vLLM插件规范开发,可使vLLM推理框架在MetaX GPU上无缝、高效运行,提供接近原生CUDA的使用体验,是vLLM社区中支持MetaX后端的推荐方案。项目兼容vLLM主流版本,支持单机多卡推理部署,同时兼容原生LLM、Engine、Kernel的API接口及OpenAI server接口。

10.9 MB
邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9 京公网安备 11010802032778号