[0035] Update wave-scope matrix groupshared operations by llvm-beanz · Pull Request #879 · microsoft/hlsl-specs

llvm-beanz · 2026-06-05T16:30:49Z

This updates the load, store, and interlocked accumlation operations on groupshared memory to align data conversion handling and support accumulating into packed data types.

Fixes #866

This updates the load, store, and interlocked accumlation operations on groupshared memory to align data conversion handling and support accumulating into packed data types. Fixes microsoft#866

llvm-beanz · 2026-06-05T16:33:28Z

Tagging a few people from hardware vendors for feedback:
@Anteru, @pow2clk, @anupamachandra & @mbedy-amd.

alan-baker · 2026-06-05T20:10:03Z

+be in any arithmetic or packed data type. The type of the array must match the
+type of the matrix being loaded into.


The way I'm reading this is if I want to load a packed uint8_t matrix the array must also be packed uint8_t. Is that correct?

Our concern with WebGPU is someone declaring a buffer (opaque type) in workgroup and using with both packed 8bit and 16-bit granularity. We aren't sure it will be guaranteed that equivalent matrices would be supported to use the Cast() method.

Also in the dxil section it says non-natively supported types must use i32 for the array. Would that be possible here?

The way I'm reading this is if I want to load a packed uint8_t matrix the array must also be packed uint8_t. Is that correct?

DXIL doesn't have 8-bit integer types, so uint8_t isn't a native type, so it would need to be uint8_t4_packed, which becomes an i32.

Our concern with WebGPU is someone declaring a buffer (opaque type) in workgroup and using with both packed 8bit and 16-bit granularity. We aren't sure it will be guaranteed that equivalent matrices would be supported to use the Cast() method.

In general DXIL doesn't allow casting groupshared pointers, so we don't really have a reasonable way to reinterpret the data from a single allocation to mean different things. The one oddity of this is non-native types that we treat as i32 (uint8_t4_packed in the API), where we carry both the data type and the interpretation so the accumulation operations know how to operate on the data.

Also in the dxil section it says non-natively supported types must use i32 for the array. Would that be possible here?

That is what the uint8_t4_packed type is, but it only works for a subset of types.

As Alan says, it looks like we would still not be able to support LinAlg in WebGPU due to this requirement. In addition to the opaque buffer type that we will soon have in WGSL, we also decompose groupshared buffers into flat arrays of (e.g.) u32 or u16 in order to work around bugs in D3D drivers. This means that we won't always have to ability to force the type to be a uint8_t4_packed, or to use f32 for an f32 matrix etc.

In general DXIL doesn't allow casting groupshared pointers

Maybe naive question, but does DXIL really need to support casting? Could the DXIL representation for these load/store operations not just allow the type mismatch, and specify that the operations are performed without conversion? This is essentially the SPIR-V approach - you can use any scalar/vector type in the src/dst pointer, and it's only used for addressing calculations.

We also considered trying to polyfill the matrix load/store operations ourselves by manually loading the data and then constructing the matrix objects using Set(), but a) I don't know how efficient this will be and b) the 8-bit matrix types don't support Get()/Set() as far as I can tell.

Treating the data in a groupshared array as two (or more) types of different sizes is currently illegal in DXIL. You can kinda work around that by just always treating the array as 32-bit integers and doing load->mask->convert chains, but the fundamental limitation exists.

There isn't a good way to work around that limitation through the linalg spec as specified.

I think we should probably have a conversation with the hardware vendors about whether or not those restrictions are still required for DXIL, or if we should remove them. Removing them has some general benefits, but would make some transformations that DXC performs on groupshared allocations potentially illegal (which is also maybe fine).

If it helps SPV_KHR_cooperative_matrix does not require the matrix component type to match the memory type. So any vendor supporting that extension should already be able to support this.

This looks like it needs updating to match the new DXIL sections?

Co-authored-by: Ashley Coleman <ascoleman@microsoft.com>

pow2clk

Most of my comments relate to changes made to the header that weren't mirrored to the other 1 or 2 places where the code is duplicated.

We discussed transforming this spec from containing so many of the requirements in header code to simplifying the code to leave that to the actual header while better explaining restrictions in the text. Is that no longer the plan? This seems to be moving in the opposite direction.

pow2clk · 2026-06-10T19:19:49Z

-  Splat(T Val);
+  [[nodiscard]] static
+      typename hlsl::enable_if<hlsl::is_arithmetic<T>::value, Matrix>::type
+      Splat(T Val);


While a welcome change, this isn't mentioned in the description

pow2clk · 2026-06-10T19:54:37Z

+      MatrixAccum32Ty::Load(GSMat, 0, 8, MatrixLayout::RowMajor);
+
+  // The next line is an error because the groupshared array isn't the right
+  // type for the matrix component. MatrixAccumTy M2 =


clang-format did you dirty here and moved part of the "next line" onto the description of it. Is this a known bug or an example of how the header is expected to error on this usage?

It is an expected error based on the current state of this spec, although there is a request (in other comments) to change that requirement.

pow2clk · 2026-06-10T20:22:59Z

+      MatrixAccum32Ty::Load(GSMat, 0, 8, MatrixLayout::RowMajor);
+
+  // The next line is an error because the groupshared array isn't the right
+  // type for the matrix component. MatrixAccumTy M2 =


clang-format again

llvm-beanz · 2026-06-16T15:28:16Z

A bunch of the code snippets across the file were out-of-sync with the appendix. I've re-sync'd all of that now.

llvm-beanz · 2026-06-16T15:32:35Z

We discussed transforming this spec from containing so many of the requirements in header code to simplifying the code to leave that to the actual header while better explaining restrictions in the text. Is that no longer the plan? This seems to be moving in the opposite direction.

Where we are without this change is a bit of a worst-of-all-worlds. The header in the annex was so far out of date from the implementation that it was no longer useful for crafting interface changes, and the code snippets across the file were in even worse state.

I did not fully sync this to the implementation, but I did pull in some of the differences in the implementation required to help me craft API changes that were needed for this PR.

This relaxes the groupshared data constraints so that i32 memory may store any type opaquely.

mjbedy

This seems like a good general direction. I'm going to follow up to confirm a couple things, but left some comments in the meantime.

mjbedy · 2026-06-19T19:08:51Z

+be in any arithmetic or packed data type. The type of the array must match the
+type of the matrix being loaded into.


This looks like it needs updating to match the new DXIL sections?

mjbedy · 2026-06-19T19:12:40Z

-[Conversions](#data-conversion-rules) section below.
+Populates a matrix with data from a `groupshared` array. If the groupshared
+memory is any type other than `i32` the matrix's component type must match the
+scalar ype of the groupshared array. If the groupshared memory is `i32` the


Suggested change

scalar ype of the groupshared array. If the groupshared memory is `i32` the

scalar type of the groupshared array. If the groupshared memory is `i32` the

mjbedy · 2026-06-19T19:13:27Z

-and groupshared memory are defined in the [Conversions](#data-conversion-rules) section
-below.
+Store a matrix to groupshared memory. If the groupshared memory is any type
+other than `i32` the matrix's component type must match the scalar ype of the


Suggested change

other than `i32` the matrix's component type must match the scalar ype of the

other than `i32` the matrix's component type must match the scalar type of the

mjbedy · 2026-06-19T19:18:26Z

 declare void @dx.op.linAlgMatrixStoreToMemory.[MatTy].[Ty](
  immarg i32,                         ; opcode
  %dx.types.LinAlgMatrix<mangling>,   ; matrix
  [Ty] addrspace(3)*,                 ; groupshared Ty[M * N]
  i32,                                ; Offset
  i32,                                ; Stride
  i32                                 ; matrix layout
  )


Are the Offset and Stride intended to be in terms of the element type of the groupshared array? (e.g., if you are storing f16 elements to i32 array, the Offset and Strides will be in terms of 4 byte elements.)

V-FEXrt · 2026-06-19T21:10:10Z

  immarg i32,                         ; opcode
  %dx.types.LinAlgMatrix<mangling>,   ; matrix
  [Ty] addrspace(3)*,                 ; groupshared Ty[M * N]
+  immarg i32,                         ; target data type


I can't add a comment directly to the right spot because there is no diff there but can you update the sections that start with

For the Load operations on groupshared arrays: and For the Store operations on groupshared arrays: to comment on what Stride and Offset are expected to be when target data type and gs array data type disagree?

[0035] Update wave-scope matrix groupshared operations

b85cf39

This updates the load, store, and interlocked accumlation operations on groupshared memory to align data conversion handling and support accumulating into packed data types. Fixes microsoft#866

github-actions Bot added the needs-triage label Jun 5, 2026

github-project-automation Bot added this to HLSL Triage Jun 5, 2026

V-FEXrt reviewed Jun 5, 2026

View reviewed changes

Comment thread proposals/0035-linalg-matrix.md Outdated

Comment thread proposals/0035-linalg-matrix.md Outdated

alan-baker reviewed Jun 5, 2026

View reviewed changes

Apply suggestions from code review

cc67f50

Co-authored-by: Ashley Coleman <ascoleman@microsoft.com>

pow2clk reviewed Jun 10, 2026

View reviewed changes

re-sync code snippets across the spec

f49252b

Allow i32 groupshared memory to store data in any type

922bf53

This relaxes the groupshared data constraints so that i32 memory may store any type opaquely.

mjbedy reviewed Jun 19, 2026

View reviewed changes

V-FEXrt reviewed Jun 19, 2026

View reviewed changes

		be in any arithmetic or packed data type. The type of the array must match the
		type of the matrix being loaded into.

	scalar ype of the groupshared array. If the groupshared memory is `i32` the
	scalar type of the groupshared array. If the groupshared memory is `i32` the

	other than `i32` the matrix's component type must match the scalar ype of the
	other than `i32` the matrix's component type must match the scalar type of the

Conversation

llvm-beanz commented Jun 5, 2026

Uh oh!

llvm-beanz commented Jun 5, 2026

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pow2clk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

llvm-beanz commented Jun 16, 2026

Uh oh!

llvm-beanz commented Jun 16, 2026

Uh oh!

mjbedy left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mjbedy Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

mjbedy Jun 19, 2026 •

edited

Loading