Optimized GPU kernels for LLM operations, built with TileLang. TileLang is a domain-specific language for expressing high-performance GPU kernels in Python, featuring easy migration, agile development, and automatic optimization.
Most kernels in this project approach the limit of hardware performance regarding the compute intensity and memory bandwidth. Some of them have already been used in internal training and inference scenarios. However, they do not represent best practices and we are actively working on improving the code quality and documentation.
Features
Gating — Top-k expert selection and scoring for Mixture of Experts routing
MoE Routing — Token-to-expert mapping, fused expansion/reduction and weight normalization
Quantization — Per-token, per-block, and per-channel FP8/FP4/E5M6 casting with fused SwiGLU+quantization ops
Transpose — Batched transpose operations
Engram — Engram gating kernels with fused RMSNorm, forward/backward passes and weight gradient reduction
Manifold HyperConnection — Hyper-connection kernels including Sinkhorn normalization and mix splitting/application
Tile Kernels
Optimized GPU kernels for LLM operations, built with TileLang. TileLang is a domain-specific language for expressing high-performance GPU kernels in Python, featuring easy migration, agile development, and automatic optimization.
Most kernels in this project approach the limit of hardware performance regarding the compute intensity and memory bandwidth. Some of them have already been used in internal training and inference scenarios. However, they do not represent best practices and we are actively working on improving the code quality and documentation.
Features
torch.autograd.Functionwrappers composing low-level kernels into trainable layers (engram gate, mHC pipeline)Requirements
Installation
Install a local development version
Install a release version
Testing
Tests using pytest:
Test single test file
Pressure test
Project Structure
Acknowledgement
This project is built on TileLang. Thanks and respect to the developers!
License
This code repository is released under the MIT License.
Citation