Skip to content

WebGPU validation error (op_handler_failed) when running qwen3.5-4b-generic-gpu:2 on Intel Arc Graphics #799

@Fatihsigircik

Description

@Fatihsigircik

Description

When running the model qwen3.5-4b-generic-gpu:2 using foundry run, an IPC error occurs with the following WebGPU validation failure. The model fails to generate any response.

Environment

  • OS: Windows (PowerShell output shows PS C:\Users\Fatih)
  • GPU: Intel Arc Graphics (device ID 7D55, vendor 8086)
  • Driver version: 32.0.101.8332 (latest as of now)
  • DirectX: 12
  • Vulkan: 1.4.328
  • OpenCL: 3.0
  • Shaders: 6.7
  • Dedicated GPU memory: 128 MB
  • Shared system memory: 18 GB
  • Foundry version: 0.10.0+174be11ea7aeacd8d0d67b0ba1daebec615284b1
  • ONNX Runtime GenAI version: included with Foundry

Steps to Reproduce

  1. Install Microsoft Azure AI Foundry CLI.
  2. Run the following command in PowerShell:
    foundry run qwen3.5-4b-generic-gpu:2
  3. Type any prompt (e.g., selamun aleyküm).
  4. Observe the error.

Actual Error Log

● error: IPC error 'op_handler_failed': Error from chat_completions command: Error:
Microsoft.ML.OnnxRuntimeGenAI.OnnxRuntimeGenAIException: WebGPU validation failed. [Buffer (unlabeled)] usage
(Storage(read-write)|Storage(read-only)) includes writable usage and another usage in the same synchronization scope.
 - While validating compute pass usage.
 - While finishing [CommandEncoder (unlabeled)].

   at Microsoft.ML.OnnxRuntimeGenAI.Result.VerifySuccess(IntPtr) + 0x47
   at Microsoft.ML.OnnxRuntimeGenAI.Generator.AppendTokenSequences(Sequences) + 0x1f
   at Microsoft.Neutron.OpenAI.Provider.OnnxChatGenerator..ctor(OnnxLoadedModel, GeneratorParams, ILogger, Sequences,
NamedTensors) + 0x94
   at Microsoft.Neutron.OpenAI.Provider.OnnxChatGenerator.CreateOnnxChatGenerator(ChatCompletionCreateRequestExtended,
OnnxLoadedModel, AzureFoundryLocalModel, ITelemetry, ILogger) + 0xa94
   at Microsoft.AI.Foundry.Local.ChatClient.<>c__DisplayClass8_0.<HandleStreamRequestAsync>b__0(CancellationToken) +
0x2a
   at Microsoft.Neutron.OpenAI.Provider.ChatCompletions.<HandleStreamRequestAsync>d__3.MoveNext() + 0x234
--- End of stack trace from previous location ---
   at Microsoft.AI.Foundry.Local.ChatClient.<HandleStreamRequestAsync>d__8.MoveNext() + 0x2cb
--- End of stack trace from previous location ---
   at Microsoft.AI.Foundry.Local.ChatClient.<HandleStreamRequestAsync>d__8.MoveNext() + 0x446
--- End of stack trace from previous location ---
   at
Microsoft.AI.Foundry.Local.NativeInterop.<>c__DisplayClass13_0.<<ExecuteCommandWithCallbackManaged>b__2>d.MoveNext() +
0x467
--- End of stack trace from previous location ---
   at
Microsoft.AI.Foundry.Local.NativeInterop.<>c__DisplayClass13_0.<<ExecuteCommandWithCallbackManaged>b__2>d.MoveNext() +
0x7d9
--- End of stack trace from previous location ---
   at Microsoft.AI.Foundry.Local.NativeInterop.<ExecuteWithTracker>d__9.MoveNext() + 0xb8

Expected Behavior

The model should run without WebGPU validation errors and generate responses.

Workarounds Found (for maintainers)

  • Using CPU-only variant also works (though slower):

    foundry run qwen3.5-4b-cpu:2
    

Additional Context

  • The error appears to be related to WebGPU buffer usage flags where the same buffer is marked as both Storage(read-write) and Storage(read-only) within the same synchronization scope. This violates the WebGPU specification.

  • The issue might be specific to Intel Arc drivers or their WebGPU implementation when used with ONNX Runtime GenAI.

Possible Root Cause

ONNX Runtime GenAI's WebGPU backend may be generating buffers with incompatible usage flags, or the Intel Arc WebGPU driver may be stricter about validation than other implementations.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions