flagos-ai/KernelGenBench

Overview

KernelGenBench is a component of FlagOS — a unified, open-source AI system software stack that fosters an open technology ecosystem by seamlessly integrating various models, systems, and chips. Following the principle of “develop once, migrate across various chips”, FlagOS aims to unlock the full computational potential of hardware, break down barriers between different chip software stacks, and effectively reduce migration costs.

KernelGenBench is a benchmark framework for evaluating LLM and agent-based Triton kernel generation across multiple hardware platforms.

Paper: KernelGenBench: A Multi-Source and Multi-Chip Benchmark for LLM-based Kernel Generation (Under Review)

KernelGenBench Overview

Features

210 operators across three sources: ATen (110), vLLM (50), cuBLAS (50)
Multi-chip support: NVIDIA, Ascend NPU, MUSA, Hygon DCU, Iluvatar, MetaX
Two evaluation tracks: LLM Track (Pass@K) and Agent Track (iterative generation)
Multiple agent methods: Claude Code, OpenCode, AutoKernel, AKO4ALL, cuda-optimized-skill
Automatic verification: accuracy testing with tolerance-based comparison

Quick Start

# NVIDIA platform
pip install -r requirements/requirements_nvidia.txt
pip install -e .

# Test single operator
python scripts/generate_kernel_and_verify.py \
    --op-name aten::add \
    --single-test \
    --server-type openai

👉 For detailed setup, see Getting Started.

Documentation

📚 Full documentation: docs/source/

Section	Description
Overview	What is KernelGenBench and why use it
Getting Started	Installation for all platforms
LLM Track	Pass@K evaluation guide
Agent Track	Agent-based evaluation guide
Reference	Datasets, operators, hardware
Development	Contributing and extending
FAQ	Common questions

Project	Description
awesome-LLM-driven-kernel-generation	Survey of AI-driven kernel generation
KernelGen	High-performance platform for automated Triton kernel generation

Citation

@inproceedings{kernelgenbench2026,
  title     = {KernelGenBench: A Multi-Source and Multi-Chip Benchmark for LLM-based Kernel Generation},
  author    = {Anonymous Author(s)},
  booktitle = {Under review},
  year      = {2026},
  note      = {Preprint PDF and repository available at \url{https://github.com/flagos-ai/KernelGenBench}},
  url       = {https://github.com/flagos-ai/KernelGenBench}
}

License

Apache 2.0 License