Description
When running the model qwen3.5-4b-generic-gpu:2 using foundry run, an IPC error occurs with the following WebGPU validation failure. The model fails to generate any response.
Environment
- OS: Windows (PowerShell output shows
PS C:\Users\Fatih)
- GPU: Intel Arc Graphics (device ID 7D55, vendor 8086)
- Driver version: 32.0.101.8332 (latest as of now)
- DirectX: 12
- Vulkan: 1.4.328
- OpenCL: 3.0
- Shaders: 6.7
- Dedicated GPU memory: 128 MB
- Shared system memory: 18 GB
- Foundry version: 0.10.0+174be11ea7aeacd8d0d67b0ba1daebec615284b1
- ONNX Runtime GenAI version: included with Foundry
Steps to Reproduce
- Install Microsoft Azure AI Foundry CLI.
- Run the following command in PowerShell:
foundry run qwen3.5-4b-generic-gpu:2
- Type any prompt (e.g., selamun aleyküm).
- Observe the error.
Actual Error Log
● error: IPC error 'op_handler_failed': Error from chat_completions command: Error:
Microsoft.ML.OnnxRuntimeGenAI.OnnxRuntimeGenAIException: WebGPU validation failed. [Buffer (unlabeled)] usage
(Storage(read-write)|Storage(read-only)) includes writable usage and another usage in the same synchronization scope.
- While validating compute pass usage.
- While finishing [CommandEncoder (unlabeled)].
at Microsoft.ML.OnnxRuntimeGenAI.Result.VerifySuccess(IntPtr) + 0x47
at Microsoft.ML.OnnxRuntimeGenAI.Generator.AppendTokenSequences(Sequences) + 0x1f
at Microsoft.Neutron.OpenAI.Provider.OnnxChatGenerator..ctor(OnnxLoadedModel, GeneratorParams, ILogger, Sequences,
NamedTensors) + 0x94
at Microsoft.Neutron.OpenAI.Provider.OnnxChatGenerator.CreateOnnxChatGenerator(ChatCompletionCreateRequestExtended,
OnnxLoadedModel, AzureFoundryLocalModel, ITelemetry, ILogger) + 0xa94
at Microsoft.AI.Foundry.Local.ChatClient.<>c__DisplayClass8_0.<HandleStreamRequestAsync>b__0(CancellationToken) +
0x2a
at Microsoft.Neutron.OpenAI.Provider.ChatCompletions.<HandleStreamRequestAsync>d__3.MoveNext() + 0x234
--- End of stack trace from previous location ---
at Microsoft.AI.Foundry.Local.ChatClient.<HandleStreamRequestAsync>d__8.MoveNext() + 0x2cb
--- End of stack trace from previous location ---
at Microsoft.AI.Foundry.Local.ChatClient.<HandleStreamRequestAsync>d__8.MoveNext() + 0x446
--- End of stack trace from previous location ---
at
Microsoft.AI.Foundry.Local.NativeInterop.<>c__DisplayClass13_0.<<ExecuteCommandWithCallbackManaged>b__2>d.MoveNext() +
0x467
--- End of stack trace from previous location ---
at
Microsoft.AI.Foundry.Local.NativeInterop.<>c__DisplayClass13_0.<<ExecuteCommandWithCallbackManaged>b__2>d.MoveNext() +
0x7d9
--- End of stack trace from previous location ---
at Microsoft.AI.Foundry.Local.NativeInterop.<ExecuteWithTracker>d__9.MoveNext() + 0xb8
Expected Behavior
The model should run without WebGPU validation errors and generate responses.
Workarounds Found (for maintainers)
Additional Context
-
The error appears to be related to WebGPU buffer usage flags where the same buffer is marked as both Storage(read-write) and Storage(read-only) within the same synchronization scope. This violates the WebGPU specification.
-
The issue might be specific to Intel Arc drivers or their WebGPU implementation when used with ONNX Runtime GenAI.
Possible Root Cause
ONNX Runtime GenAI's WebGPU backend may be generating buffers with incompatible usage flags, or the Intel Arc WebGPU driver may be stricter about validation than other implementations.
Description
When running the model
qwen3.5-4b-generic-gpu:2usingfoundry run, an IPC error occurs with the following WebGPU validation failure. The model fails to generate any response.Environment
PS C:\Users\Fatih)Steps to Reproduce
Actual Error Log
Expected Behavior
The model should run without WebGPU validation errors and generate responses.
Workarounds Found (for maintainers)
Using CPU-only variant also works (though slower):
Additional Context
The error appears to be related to WebGPU buffer usage flags where the same buffer is marked as both Storage(read-write) and Storage(read-only) within the same synchronization scope. This violates the WebGPU specification.
The issue might be specific to Intel Arc drivers or their WebGPU implementation when used with ONNX Runtime GenAI.
Possible Root Cause
ONNX Runtime GenAI's WebGPU backend may be generating buffers with incompatible usage flags, or the Intel Arc WebGPU driver may be stricter about validation than other implementations.