[中文版|English]
KernelGenBench is a component of FlagOS — a unified, open-source AI system software stack that fosters an open technology ecosystem by seamlessly integrating various models, systems, and chips. Following the principle of "develop once, migrate across various chips", FlagOS aims to unlock the full computational potential of hardware, break down barriers between different chip software stacks, and effectively reduce migration costs.
KernelGenBench is a benchmark framework for evaluating LLM and agent-based Triton kernel generation across multiple hardware platforms.
- 210 operators across three sources: ATen (110), vLLM (50), cuBLAS (50)
- Multi-chip support: NVIDIA, Ascend NPU, MUSA, Hygon DCU, Iluvatar, MetaX
- Two evaluation tracks: LLM Track (Pass@K) and Agent Track (iterative generation)
- Multiple agent methods: Claude Code, OpenCode, AutoKernel, AKO4ALL, cuda-optimized-skill
- Automatic verification: accuracy testing with tolerance-based comparison
# NVIDIA platform
pip install -r requirements/requirements_nvidia.txt
pip install -e .
# Test single operator
python scripts/generate_kernel_and_verify.py \
--op-name aten::add \
--single-test \
--server-type openai👉 For detailed setup, see Getting Started.
📚 Full documentation: docs/source/
| Section | Description |
|---|---|
| Overview | What is KernelGenBench and why use it |
| Getting Started | Installation for all platforms |
| LLM Track | Pass@K evaluation guide |
| Agent Track | Agent-based evaluation guide |
| Reference | Datasets, operators, hardware |
| Development | Contributing and extending |
| FAQ | Common questions |
| Project | Description |
|---|---|
| awesome-LLM-driven-kernel-generation | Survey of AI-driven kernel generation |
| KernelGen | High-performance platform for automated Triton kernel generation |
@software{kernelgenbench2026,
title={KernelGenBench: A Benchmark for LLM and Agent-Based Triton Kernel Generation},
author={KernelGen Team},
url={https://github.com/flagos-ai/KernelGenBench},
year={2026}
}Apache 2.0 License

