Expert Kit: A Distributed, Expert-Centric Framework for MoE LLM Inference
[!CAUTION]
Early Work-in-Progress. This project is currently a proof-of-concept demo and is under active development. It is not intended for production use and may contain significant bugs, security vulnerabilities, and unexpected behavior. We appreciate community feedback and contributions as we continue to build and refine this project.
About
Expert Kit (EK) is a high-performance framework for scalable MoE (Mixture of Experts) LLM inference. The vision of EK is to provide an efficient foundation of Expert Parallelism (EP) on heterogeneous hardware (e.g., CPU and GPU) over commodity networks (e.g. PCIe, TCP, RDMA), thereby enabling easy deployment and fine-grained expert-level scaling.
EK features Expert-Attention (E/A) separation architecture, enabling MoE LLMs to be deployed efficiently in a distributed environment composed of x CPUs and y GPUs.
The motivation behind the E/A separation lies in our observation that, in modern MoE LLMs, expert parameters account for the vast majority of the model size (e.g., over 90% in DeepSeek-V3).
By decoupling expert modules and deploying them across distributed GPUs and CPUs, EK leverages the high bandwidth and large capacity of distributed memory and storage systems.
Here are some tutorials to help you quickly start with Expert Kit.
DeepSeek-tiny: A tailored MoE model with DeepSeek-V3 architecture and small parameter count, designed for quick evaluation and testing of the Expert Kit framework.
DeepSeek-V3: A demo for running the DeepSeek-V3 model with Expert Kit, showcasing the framework’s capabilities in handling large-scale MoE models.
Qwen3-30B-A3B: A demo for running the Qwen3-30B-A3B model with Expert Kit, showcasing the framework’s capabilities in handling real-world MoE models.
Key Features
Low-Cost Deployment: supports distributed and mixed GPUs and CPUs.
Fine-Grained Expert-Level Scalability: provides independent scaling of attention and experts, with dynamic scaling of hot experts on demand
Primary License: This project as a whole is licensed under the GNU GPL 3.0.
Third-Party Components:
Licenses and copyright notices for third-party components are located alongside the component code directory.
The following components are included:
DeepSeek-V3 (Code/Complementary Material): Located in ek-integration/expertkit-torch/expertkit-torch/models/deepseek_v3/. This code is licensed under the DeepSeek License Agreement v1.0 and the MIT License. Please be aware that use of the associated DeepSeek Model is subject to the use restrictions detailed in Attachment A of the DeepSeek License Agreement v1.0.
Qwen3-MoE: Located in ek-integration/expertkit-torch/expertkit-torch/models/. This code is licensed under Apache License Version 2.0.
Compliance: All third-party components are used in compliance with their original license terms.
Expert Kit: A Distributed, Expert-Centric Framework for MoE LLM Inference
About
Expert Kit (EK) is a high-performance framework for scalable MoE (Mixture of Experts) LLM inference. The vision of EK is to provide an efficient foundation of Expert Parallelism (EP) on heterogeneous hardware (e.g., CPU and GPU) over commodity networks (e.g. PCIe, TCP, RDMA), thereby enabling easy deployment and fine-grained expert-level scaling.
EK features Expert-Attention (E/A) separation architecture, enabling MoE LLMs to be deployed efficiently in a distributed environment composed of x CPUs and y GPUs. The motivation behind the E/A separation lies in our observation that, in modern MoE LLMs, expert parameters account for the vast majority of the model size (e.g., over 90% in DeepSeek-V3). By decoupling expert modules and deploying them across distributed GPUs and CPUs, EK leverages the high bandwidth and large capacity of distributed memory and storage systems.
https://github.com/user-attachments/assets/9f1f5b23-28fe-44cf-b592-2f6ad0ad4dad
Quick Start
Here are some tutorials to help you quickly start with Expert Kit.
Key Features
Performance
Repository Map
Roadmap
Core Features
Contact Us
If you have any questions, please join our discussion at https://expert-kit.zulipchat.com/ or post new issues.
License Agreement
Primary License: This project as a whole is licensed under the GNU GPL 3.0.
Third-Party Components:
ek-integration/expertkit-torch/expertkit-torch/models/deepseek_v3/
. This code is licensed under the DeepSeek License Agreement v1.0 and the MIT License. Please be aware that use of the associated DeepSeek Model is subject to the use restrictions detailed in Attachment A of the DeepSeek License Agreement v1.0.ek-integration/expertkit-torch/expertkit-torch/models/
. This code is licensed under Apache License Version 2.0.Compliance: All third-party components are used in compliance with their original license terms.