[WIP] implement all benchmarks on triton by dshaaban01 · Pull Request #43 · spcl/npbench

dshaaban01 · 2025-12-17T17:35:49Z

This PR implements all benchmarks on triton. It should be stacked on #42.

54 triton/torch kernels with @triton.autotune across the npbench suite, plus TritonFramework registration. Will be followed by an autotune-shrink commit and numeric-failure triage. Source: github.com/spcl/pull/43 (dshaaban01:pr2) # Conflicts: # requirements.txt

…sweep PR spcl#43's per-kernel get_configs() helpers expand via itertools.product into 32-60 triton.Config entries, e.g. gemm sweeps [32,64]^3 × [1,2,4,8] num_warps. Running the full sweep on every S-preset call dwarfs the per-call kernel work and produces noisy timings for early benchmarks. Add a one-shot monkey-patch in TritonFramework.__init__ that wraps triton.runtime.autotuner.Autotuner.__init__ to slice the `configs` argument to the first N entries when NPBENCH_TRITON_AUTOTUNE_SIZE is unset or 'small' (default cap N = 4, override via NPBENCH_TRITON_AUTOTUNE_N). Set NPBENCH_TRITON_AUTOTUNE_SIZE=full to disable the cap and run the upstream PR spcl#43 sweeps as-is. Kernel files are not modified; the patch runs before any *_triton.py import because TritonFramework is created before Test.run() imports the kernel module. Also: add `torch` to requirements.txt (triton needs torch as the array backend; PR spcl#43 added triton to env.yml but not requirements.txt).

…rnels After merging PR spcl#43 (triton) into extended, ran an S-preset dynamic sweep on 19 triton kernels (28 PASS / 9 FAIL / 1 CRASH) and launched two static-review agents in parallel: one over all 54 *_triton.py files, one over all 53 *_jax.py files against their *_numpy.py reference. Findings consolidated into tests/TRIAGE_TRITON_JAX.md: - 15 unique triton bugs in 11 kernels: precision-eroding casts (gemm unconditionally downcasts fp64 inputs to fp32 before tl.dot; doitgen hardcodes an fp64 accumulator); mask/bounds issues (jacobi_1d zeros boundary cells via other=0.0; softmax NaN on entirely-OOB rows; azimint_hist div-by-zero pre-flagged); atomic-ordering / fp-non-repro (nbody energy, covariance mean, atax/bicg/azimint_naive/azimint_hist); init/logic bugs (mandelbrot1 records last-active iteration not the escape iteration); autotune configs that may OOM (jacobi_1d 2048-elem blocks, azimint_naive 1024-elem blocks on consumer GPUs). - 11 jax kernels flagged: dtype promotion without `jax_enable_x64` (mandelbrot1, mandelbrot2, nbody); algorithm divergence in reduction order or loop semantics (mandelbrot2 mask vs filter, seidel_2d per-element divide, correlation/covariance missing symmetric fill, durbin roll/flip vs slicing, azimint_naive masked-mean); cond/scalar bugs (contour_integral); twiddle indexing (stockham_fft); missing in-place rebind (nbody vel -= mean). Planted brief TODO comments pointing at the triage doc in the kernels that fail dynamic validation today: gemm, mandelbrot1, jacobi_1d, azimint_naive, azimint_hist (triton); mandelbrot1, correlation (jax). Fixes are deferred per user direction — this commit is diagnosis only.

dshaaban01 and others added 3 commits December 17, 2025 18:11

add float32 functionality

4cdd892

add datatype

820b7e5

Add triton kernels for all of npbench

b7d1f46

dshaaban01 changed the title ~~implement all benchmarks on triton~~ [WIP] implement all benchmarks on triton Dec 17, 2025

dshaaban01 marked this pull request as draft December 17, 2025 17:36

dshaaban01 marked this pull request as ready for review December 17, 2025 17:36

ThrudPrimrose requested review from ThrudPrimrose, acalotoiu and alexnick83 December 18, 2025 08:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] implement all benchmarks on triton#43

[WIP] implement all benchmarks on triton#43
dshaaban01 wants to merge 3 commits into
spcl:mainfrom
dshaaban01:pr2

dshaaban01 commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dshaaban01 commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants