[WIP] implement all benchmarks on triton#43
Open
dshaaban01 wants to merge 3 commits into
Open
Conversation
ThrudPrimrose
added a commit
to ThrudPrimrose/npbench
that referenced
this pull request
May 28, 2026
54 triton/torch kernels with @triton.autotune across the npbench suite, plus TritonFramework registration. Will be followed by an autotune-shrink commit and numeric-failure triage. Source: github.com/spcl/pull/43 (dshaaban01:pr2) # Conflicts: # requirements.txt
ThrudPrimrose
added a commit
to ThrudPrimrose/npbench
that referenced
this pull request
May 28, 2026
…sweep PR spcl#43's per-kernel get_configs() helpers expand via itertools.product into 32-60 triton.Config entries, e.g. gemm sweeps [32,64]^3 × [1,2,4,8] num_warps. Running the full sweep on every S-preset call dwarfs the per-call kernel work and produces noisy timings for early benchmarks. Add a one-shot monkey-patch in TritonFramework.__init__ that wraps triton.runtime.autotuner.Autotuner.__init__ to slice the `configs` argument to the first N entries when NPBENCH_TRITON_AUTOTUNE_SIZE is unset or 'small' (default cap N = 4, override via NPBENCH_TRITON_AUTOTUNE_N). Set NPBENCH_TRITON_AUTOTUNE_SIZE=full to disable the cap and run the upstream PR spcl#43 sweeps as-is. Kernel files are not modified; the patch runs before any *_triton.py import because TritonFramework is created before Test.run() imports the kernel module. Also: add `torch` to requirements.txt (triton needs torch as the array backend; PR spcl#43 added triton to env.yml but not requirements.txt).
ThrudPrimrose
added a commit
to ThrudPrimrose/npbench
that referenced
this pull request
May 28, 2026
…rnels After merging PR spcl#43 (triton) into extended, ran an S-preset dynamic sweep on 19 triton kernels (28 PASS / 9 FAIL / 1 CRASH) and launched two static-review agents in parallel: one over all 54 *_triton.py files, one over all 53 *_jax.py files against their *_numpy.py reference. Findings consolidated into tests/TRIAGE_TRITON_JAX.md: - 15 unique triton bugs in 11 kernels: precision-eroding casts (gemm unconditionally downcasts fp64 inputs to fp32 before tl.dot; doitgen hardcodes an fp64 accumulator); mask/bounds issues (jacobi_1d zeros boundary cells via other=0.0; softmax NaN on entirely-OOB rows; azimint_hist div-by-zero pre-flagged); atomic-ordering / fp-non-repro (nbody energy, covariance mean, atax/bicg/azimint_naive/azimint_hist); init/logic bugs (mandelbrot1 records last-active iteration not the escape iteration); autotune configs that may OOM (jacobi_1d 2048-elem blocks, azimint_naive 1024-elem blocks on consumer GPUs). - 11 jax kernels flagged: dtype promotion without `jax_enable_x64` (mandelbrot1, mandelbrot2, nbody); algorithm divergence in reduction order or loop semantics (mandelbrot2 mask vs filter, seidel_2d per-element divide, correlation/covariance missing symmetric fill, durbin roll/flip vs slicing, azimint_naive masked-mean); cond/scalar bugs (contour_integral); twiddle indexing (stockham_fft); missing in-place rebind (nbody vel -= mean). Planted brief TODO comments pointing at the triage doc in the kernels that fail dynamic validation today: gemm, mandelbrot1, jacobi_1d, azimint_naive, azimint_hist (triton); mandelbrot1, correlation (jax). Fixes are deferred per user direction — this commit is diagnosis only.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR implements all benchmarks on triton. It should be stacked on #42.