Amber

Merge branch ‘dev’ into testing

1天前172次提交

.vscodeBasic apm: add support for Prometheus exporting, Vector, Clickhouse and Grafana (#44)2个月前
containerAdd dockerfile and related docker-compose for quick testing (#23)2个月前
devfeat: inject detailed activation to clickhouse2个月前
docupdate: add release tags (#72)1天前
ek-baseMerge branch 'feat/cpu-binding' into canary3天前
ek-benchmarkfeat:gpu support in backend (#46)4天前
ek-cliMerge branch 'feat/cpu-binding' into canary3天前
ek-computationMerge branch 'feat/cpu-binding' into canary3天前
ek-dbfeat: use log create kv feature to print log (#57)2个月前
ek-integrationfeat: choose default backend according to the environment (#65)1个月前
ek-protofeat: add rebalance function to controller grpc service2个月前
ek-solutionMinor fix for running deepseek-v3 671B (#25)2个月前
.dockerignorebuild: add init docker file3个月前
.gitattributestest: add test resources3个月前
.gitignorefix: typo in torch integration and small enhancement3个月前
.lfsconfigfix: final try of .lfsconfig2个月前
.python-versiondev: tweak model splitter4个月前
Cargo.lockfix: tokio runtime default init with no affinity binding2个月前
Cargo.tomlfix: tokio runtime default init with no affinity binding2个月前
LICENSEdoc: update readme3个月前
README.mdUpdate video for online scaling1个月前
buf.yamlfeat: support onnxruntime #30 (#38)2个月前
pyproject.tomlMinor fix for running deepseek-v3 671B (#25)2个月前
ruff.tomltool: introduce ruff to standardize py style4个月前
rust-toolchain.tomlAdd dockerfile and related docker-compose for quick testing (#23)2个月前
uv.lockMinor fix for running deepseek-v3 671B (#25)2个月前

README.md

Expert Kit: A Distributed, Expert-Centric Framework for MoE LLM Inference

[!CAUTION] Early Work-in-Progress. This project is currently a proof-of-concept demo and is under active development. It is not intended for production use and may contain significant bugs, security vulnerabilities, and unexpected behavior. We appreciate community feedback and contributions as we continue to build and refine this project.

About

Expert Kit (EK) is a high-performance framework for scalable MoE (Mixture of Experts) LLM inference. The vision of EK is to provide an efficient foundation of Expert Parallelism (EP) on heterogeneous hardware (e.g., CPU and GPU) over commodity networks (e.g. PCIe, TCP, RDMA), thereby enabling easy deployment and fine-grained expert-level scaling.

EK features Expert-Attention (E/A) separation architecture, enabling MoE LLMs to be deployed efficiently in a distributed environment composed of x CPUs and y GPUs. The motivation behind the E/A separation lies in our observation that, in modern MoE LLMs, expert parameters account for the vast majority of the model size (e.g., over 90% in DeepSeek-V3). By decoupling expert modules and deploying them across distributed GPUs and CPUs, EK leverages the high bandwidth and large capacity of distributed memory and storage systems.

https://github.com/user-attachments/assets/9f1f5b23-28fe-44cf-b592-2f6ad0ad4dad

Quick Start

Here are some tutorials to help you quickly start with Expert Kit.

DeepSeek-tiny: A tailored MoE model with DeepSeek-V3 architecture and small parameter count, designed for quick evaluation and testing of the Expert Kit framework.
DeepSeek-V3: A demo for running the DeepSeek-V3 model with Expert Kit, showcasing the framework’s capabilities in handling large-scale MoE models.
Qwen3-30B-A3B: A demo for running the Qwen3-30B-A3B model with Expert Kit, showcasing the framework’s capabilities in handling real-world MoE models.

Key Features

Low-Cost Deployment: supports distributed and mixed GPUs and CPUs.
Fine-Grained Expert-Level Scalability: provides independent scaling of attention and experts, with dynamic scaling of hot experts on demand

Performance

Model	Throughput (tokens/s)	Environment
DeepSeek-V3 671B W8A16	14.26	1xNvidia 4090(24G) + 5xAMD EPYC 7302
Qwen3-MoE-30B FP16	36.38	1xNvidia A10(24G) + 1xAMD EPYC 7302 + 1xKunpeng 920

Repository Map

ek-computation: performs schedule(frontend) and computation(backend) task.
ek-db: supports registering and loading experts’ weight in fine-grained granularity.
ek-benchmark: contains several micro-benchmarks help you know the performance.
ek-solution: contains several recipes to quickly setup a running cluster.

Roadmap

Core Features

Frontend for request schedule
- Simple Executor
- Extensible Executor
- Schedule Interface
Backend compute engine for expert computation
- pytorch
- onnxruntime
- candle
Integration with existing framework for attention computation
- pytorch
- vLLM
Transport channel between frontend and backend
- gRPC
- RDMA
- DSM

Contact Us

If you have any questions, please join our discussion at https://expert-kit.zulipchat.com/ or post new issues.

License Agreement

Primary License: This project as a whole is licensed under the GNU GPL 3.0.
Third-Party Components:
- Licenses and copyright notices for third-party components are located alongside the component code directory.
- The following components are included:
  - DeepSeek-V3 (Code/Complementary Material): Located in ek-integration/expertkit-torch/expertkit-torch/models/deepseek_v3/. This code is licensed under the DeepSeek License Agreement v1.0 and the MIT License. Please be aware that use of the associated DeepSeek Model is subject to the use restrictions detailed in Attachment A of the DeepSeek License Agreement v1.0.
  - Qwen3-MoE: Located in ek-integration/expertkit-torch/expertkit-torch/models/. This code is licensed under Apache License Version 2.0.
Compliance: All third-party components are used in compliance with their original license terms.

关于

README.md

431.6 MB

邀请码

加入我们
官网邮箱：gitlink@ccf.org.cn

QQ群

公众号