GitHub - flagos-ai/FlagTensor

About

FlagTensor is part of FlagOS, a fully open-source system software stack designed to unify the model–system–chip layers and foster an open and collaborative ecosystem. It enables a "develop once, run anywhere" workflow across diverse AI accelerators, unlocking hardware performance, eliminating fragmentation among AI chipset-specific software stacks, and substantially lowering the cost of porting and maintaining AI workloads.

FlagTensor is a high-performance tensor-primitive library implemented in Triton language. It provides optimized implementations of common tensor primitives (unary, binary, and tensor contraction operations) benchmarked against cuTensor baselines, delivering reference-level correctness with competitive performance across diverse GPU architectures.

Built on FlagTree (a FlagOS-maintained Triton fork supporting multiple hardware backends), FlagTensor offers a vendor-agnostic operator interface with pluggable backend support.

Features

Comprehensive collection of tensor primitives: unary (28 ops), binary (4 ops), contraction (6 ops)
Hand-optimized Triton kernels with per-architecture autotune (Ampere, Hopper)
Correctness validated against CPU-FP64 golden reference
Performance benchmarked against cuTensor baselines
Vendor-agnostic backend abstraction (15 vendors registered)
Architecture-specific kernel specialization (e.g., _nvidia/hopper/, _nvidia/ampere/)
Per-operator test infrastructure with pytest marks and JSON result recording
Multi-GPU parallel test runner with live progress display
CI-ready: quality gates (lint/format), correctness & performance pipelines

For a complete list of operators and their maturity stages, see conf/operators.yaml.

Getting Started

Refer to the Environment Setup Guide for a complete installation walkthrough.

Quick start on NVIDIA A100:

# 1. Install PyTorch
pip install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu124

# 2. Install cuTensor
pip install cutensor-cu12
ln -sf $(python3 -c "import cutensor; print(cutensor.__path__[0])")/lib/libcutensor.so.2 \
  /usr/lib/x86_64-linux-gnu/libcutensor.so

# 3. Install FlagTree (Triton fork)
pip install --no-cache-dir \
  --index-url=https://resource.flagos.net/repository/flagos-pypi-hosted/simple \
  --trusted-host=resource.flagos.net \
  "flagtree==0.4.0+3.3" --no-deps

# 4. Install FlagTensor
pip install -e . --no-deps

Usage

import torch
import flagtensor

# Element-wise operations
x = torch.randn(1024, device="cuda", dtype=torch.float32)
y = flagtensor.abs(x)
z = flagtensor.relu(x)
w = flagtensor.sigmoid(x)

# Binary operations
a = torch.randn(1024, device="cuda")
b = torch.randn(1024, device="cuda")
c = flagtensor.add(a, b)

# Tensor contraction
m = torch.randn(64, 32, device="cuda")
n = torch.randn(32, 48, device="cuda")
r = flagtensor.gett(m, n)

Running Tests

# Single operator correctness test
pytest tests/unary/test_abs.py -v

# Record test results as JSON (using CPU-FP64 reference)
pytest tests/unary/test_abs.py --ref cpu --record json --output results.json

# Multi-GPU test runner (from YAML registry)
python tools/run_tests.py --stages stable --gpus 0,1

# Extract operator marks
python tools/get_marks.py --stage stable --output ops.txt

# Benchmark with recording
pytest benchmark/test_unary_perf.py -m abs \
  --mode kernel --level core --record log

# Parse benchmark summary
python tools/summary_for_plot.py result-*.log

Project Structure

FlagTensor
├── src/flagtensor/            # Python source
│   ├── ops/                   # Operator implementations (CUTENSOR_OP_*.py)
│   ├── utils/                 # Utility functions & kernel builders
│   ├── runtime/               # Runtime support
│   │   ├── backend/           # Vendor & architecture backends (_nvidia/, _ascend/, ...)
│   │   └── common.py          # Vendor enumeration & capability constants
│   ├── testing/               # Testing utilities (assertions, shapes, dtypes)
│   ├── fused/                 # Fused operators
│   └── modules/               # Module implementations
├── tests/                     # Per-operator correctness tests
│   ├── unary/test_<op>.py     # 28 unary operator tests
│   ├── binary/test_<op>.py    # 4 binary operator tests
│   ├── contraction/           # Contraction operator tests
│   └── sparse/                # Sparse operator tests
├── benchmark/                 # Performance tests
│   ├── consts.py              # Dtypes, shapes, metrics definitions
│   └── test_<category>_perf.py
├── tools/                     # CLI tooling
│   ├── run_tests.py           # Multi-GPU test runner
│   ├── get_marks.py           # Extract pytest marks from YAML
│   └── summary_for_plot.py    # Parse & aggregate benchmark logs
├── conf/
│   └── operators.yaml         # Operator registry (authoritative test entry point)
├── docs/                      # Documentation
├── .github/workflows/         # CI/CD pipelines
├── LICENSE
├── README.md
└── pyproject.toml

Supported Operators

Category	Operators	Status
Unary	abs, acos, acosh, asin, asinh, atan, atanh, ceil, conj, cos, cosh, exp, floor, identity, log, mish, neg, rcp, relu, sigmoid, sin, sinh, soft_plus, soft_sign, sqrt, swish, tan, tanh	stable
Binary	add, max, min, mul	stable
Contraction	gett, tgett, ttgt, tensor_contraction_trinary, trinary_generic	stable
Sparse	block_sparse_tensor_contraction	experimental

Contribution

If you are interested in contributing to the FlagTensor project, please refer to the contribution guide. Any contributions would be highly appreciated.
Please file an issue for feature requests or bug reports.
Drop us an email at contact@flagos.io when you have questions or suggestions to share.

Citation

If you find our work useful, please consider citing our project:

@misc{flagtensor2025,
    title={FlagOS/FlagTensor: A high-performance tensor-primitive library benchmarked against cuTensor},
    url={https://github.com/flagos-ai/FlagTensor},
    journal={GitHub},
    author={The FlagOS contributors},
    year={2025}
}

Related Projects

FlagGems — General-purpose Triton operator library (500+ operators)
FlagTree — Multi-backend Triton fork maintained by FlagOS

License

The FlagTensor project is licensed under the Apache License (Version 2.0).

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
.github/workflows		.github/workflows
benchmark		benchmark
conf		conf
docker		docker
docs/acceptance		docs/acceptance
src		src
tests		tests
tools		tools
.clang-format		.clang-format
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
CONTRIBUTING_cn.md		CONTRIBUTING_cn.md
LICENSE		LICENSE
README.md		README.md
README_cn.md		README_cn.md
SECURITY.md		SECURITY.md
docker.md		docker.md
pyproject.toml		pyproject.toml
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Features

Getting Started

Usage

Running Tests

Project Structure

Supported Operators

Contribution

Citation

Related Projects

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Features

Getting Started

Usage

Running Tests

Project Structure

Supported Operators

Contribution

Citation

Related Projects

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages