Onyxia

GPU compute shader runtime for ONNX models. Uses a dispatch-based execution model where operators compile their shaders at compile time and compute shapes at runtime from actual input tensors.

Architecture

ONNX Model (.onnx)
     │
     ▼
┌─────────────┐     ┌──────────────┐     ┌───────────────┐     ┌───────────────┐
│ onyxia-onnx │────▶│ onyxia-core  │◀────│  onyxia-ops   │     │onyxia-runtime │
│ Parse ONNX  │     │ IR, dispatch │     │ 3 core        │     │ Register-based│
│ protobuf    │     │ traits       │     │ operators     │     │ GPU execution │
└─────────────┘     └──────┬───────┘     └───────────────┘     └───────────────┘
                           │
                    ┌──────┴───────┐
                    │ onyxia-      │
                    │ compiler     │──────▶ CompiledModel ──────▶ GPU Execution
                    │ Build dispatch│
                    └──────────────┘

Crate	Purpose
`onyxia-onnx`	Parse ONNX protobuf into a structured `Graph` API
`onyxia-core`	IR graph, operator/dispatch traits, compiled model types, operator registry
`onyxia-operators`	3 core ONNX operator implementations (Add, Mul, Reshape)
`onyxia-compiler`	Simplified pipeline: initialize constants → build dispatch model
`onyxia-runtime`	Register-based GPU execution engine via `wgpu`
`onyxia-cli`	CLI for model inspection, validation, and DOT visualization

See ARCHITECTURE.md for detailed design documentation.

Features

ONNX parsing — stable Graph API independent of protobuf schema
Dispatch-based execution — operators compute shapes at runtime from actual input tensors
3 core operators — Add, Mul, Reshape (minimal proof-of-concept set)
Extensible operator system — add custom operators via the Operator trait
Shader compilation — WGSL → naga::Module via naga_oil at compile time
Register-based GPU execution — efficient tensor routing via indexed register file
CLI tools — model inspection, node tracing, DOT visualization, validation

Built-in Operators (3)

Category	Operators
Binary elementwise	Add, Mul
Shape manipulation	Reshape

Usage

Running a Model

use onyxia_onnx::load_model;
use onyxia_compiler::compile;
use onyxia_operators::core_operator_registry;
use onyxia_runtime::{Runtime, Tensor};
use std::collections::HashMap;

#[pollster::main]
async fn main() -> anyhow::Result<()> {
    // 1. Parse ONNX model
    let model = load_model("model.onnx")?;
    let graph = onyxia_onnx::parse_model(&model)?;

    // 2. Compile to dispatch model
    let registry = core_operator_registry();
    let mut pipeline = onyxia_compiler::CompilerPipeline::new();
    let compiled = pipeline.compile(&graph, &registry)?;

    // 3. Execute on GPU
    let runtime = Runtime::new().await?;
    let mut executor = runtime.load_model(compiled).await?;

    let input = Tensor::from_vec(vec![1.0f32, 2.0, 3.0, 4.0], &[1, 4]);
    let outputs = executor.run(&[("input", input)])?;

    println!("Output: {:?}", outputs["output"].to_vec::<f32>()?);
    Ok(())
}

Adding Custom Operators

use onyxia_core::{Operator, CompileCtx, OpDispatch, DispatchCtx, RuntimeTensor, Result};
use std::collections::HashMap;

struct MyCustomOperator;

impl Operator for MyCustomOperator {
    fn name(&self) -> &str { "MyCustomOp" }

    fn create_dispatch(&self, ctx: &mut CompileCtx) -> Result<Box<dyn OpDispatch>> {
        // Compile WGSL shader and create dispatch object
        let module = ctx.compile_shader(
            "my_custom_op",
            include_str!("shader.wgsl"),
            &HashMap::new(),
        )?;
        
        Ok(Box::new(MyCustomDispatch { module }))
    }
}

struct MyCustomDispatch {
    module: naga::Module,
}

impl OpDispatch for MyCustomDispatch {
    fn dispatch(
        &self,
        inputs: Vec<RuntimeTensor>,
        ctx: &mut DispatchCtx,
    ) -> Result<Vec<RuntimeTensor>> {
        // Compute output shape from input shapes
        let output_shape = inputs[0].shape.clone();
        
        // Allocate output buffer and dispatch GPU work
        // ... implementation ...
        
        todo!()
    }
}

// Register alongside built-in operators
let mut registry = onyxia_operators::core_operator_registry();
registry.register("MyCustomOp", MyCustomOperator);
let mut pipeline = onyxia_compiler::CompilerPipeline::new();
let compiled = pipeline.compile(&graph, &registry)?;

CLI

# Inspect model structure
cargo run --bin onyxia -- inspect model.onnx

# Inspect specific nodes
cargo run --bin onyxia -- inspect-node model.onnx --name "/layer0/attention/query"

# List nodes filtered by op type
cargo run --bin onyxia -- list-nodes model.onnx --op-type MatMul --show-shapes

# Trace data flow around a node
cargo run --bin onyxia -- trace-node model.onnx --name "/layer0/ffn/add" --depth 2

# Validate model compilation
cargo run --bin onyxia -- validate model.onnx

# Generate DOT visualization
cargo run --bin onyxia -- dot model.onnx -o model.dot -s summary
dot -Tpng model.dot -o model.png   # requires Graphviz

Prerequisites

Protocol Buffers Compiler (`protoc`)

Required for building the ONNX parser (onyxia-onnx uses prost-build). Install via your package manager:

macOS: brew install protobuf
Linux (apt): apt install protobuf-compiler
Linux (dnf): dnf install protobuf-compiler
Windows (winget): winget install protobuf
Windows (Chocolatey): choco install protoc

See protobuf installation guide for more options.

Building

cargo build

Testing

Tests are run with nextest:

cargo nextest run                                   # Non-GPU tests
cargo nextest run --run-ignored=all --no-fail-fast  # All tests including GPU

GPU-dependent tests are marked #[ignore] and require a GPU.

Profiling

Onyxia includes built-in support for performance profiling with Tracy. The runtime and all major operators are instrumented with tracing spans.

# Build with Tracy profiling enabled
cargo build --release -p onyxia-cli --features tracy

# Run with profiling (Tracy GUI must be running)
cargo run --release -p onyxia-cli --features tracy -- run-model [args]

See PROFILING.md for detailed setup instructions and usage guide.

Example Models

The models/ directory contains sample ONNX models for testing:

Gemma 3 270m (quantized LLM): models/gemma-3-270m-it-ONNX/onnx/model_q4.onnx — 18 transformer layers, 4 attention heads, vocab size 262K. Uses MatMulNBits, GroupQueryAttention, RotaryEmbedding.
Gemma 3 1B (larger model): models/gemma-3-1b-it-ONNX/onnx/

License

MIT OR Apache-2.0

Logo color palette: https://lospec.com/palette-list/technogarten

Name		Name	Last commit message	Last commit date
Latest commit History 210 Commits
.config		.config
.github		.github
.vscode		.vscode
crates		crates
demos/gemma-chat		demos/gemma-chat
doc/onnx-spec		doc/onnx-spec
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
justfile		justfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Onyxia

Architecture

Features

Built-in Operators (3)

Usage

Running a Model

Adding Custom Operators

CLI

Prerequisites

Protocol Buffers Compiler (`protoc`)

Building

Testing

Profiling

Example Models

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Onyxia

Architecture

Features

Built-in Operators (3)

Usage

Running a Model

Adding Custom Operators

CLI

Prerequisites

Protocol Buffers Compiler (protoc)

Building

Testing

Profiling

Example Models

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Protocol Buffers Compiler (`protoc`)

Packages