Update cpuinfo to include cpuinfo_deinitialize(), fix QNN ETW logging, GQA underflow, and ep_weight_sharing_ctx_gen build by crvineeth97 · Pull Request #28245 · microsoft/onnxruntime

crvineeth97 · 2026-04-27T19:45:43Z

Description

This PR contains three commits:

Commit 1: Miscellaneous fixes

Downgrade QNN ETW profiling mismatch logs from ERROR to VERBOSE to reduce excessive telemetry noise (~1 billion events/week across Windows devices)
Add bounds checking in GQA attention to prevent size_t underflow when seqlens_k contains invalid data (fixes Access Violation in onnxruntime_perf_test.exe due to inconsistent seqlens_k tensor random values #27170)
Build ep_weight_sharing_ctx_gen for TensorRT, OpenVINO, and VitisAI in addition to QNN

Commit 2: Bump cpuinfo and add cpuinfo_deinitialize() integration

Applications that dynamically load and unload the onnxruntime DLL leave orphaned heap allocations from cpuinfo when the library is unloaded mid-process. These are flagged as memory leaks by App Verifier, Valgrind, AddressSanitizer, and LeakSanitizer.

This commit bumps pytorch/cpuinfo to a version that implements cpuinfo_deinitialize() (pytorch/cpuinfo#387) and adds ORT integration:

CPUIDInfo::ShutDown() calls cpuinfo_deinitialize() to free heap-allocated globals
DllMain calls ShutdownCpuInfo() on DLL_PROCESS_DETACH
In memleak-check builds, shutdown also runs during process termination
InstanceCreated atomic guard prevents singleton creation during DLL unload

Commit 3: Update to official cpuinfo merged fix

After pytorch/cpuinfo#387 merged upstream, updated the dependency to point to pytorch/cpuinfo main (4628dc06).

Patch changes:

Removed win_arm_fp16_detection_fallback.patch — upstreamed via pytorch/cpuinfo#348
Updated patch_vcpkg_arm64ec_support.patch — regenerated for new cpuinfo; still needed (pytorch/cpuinfo#324 not yet merged)
Updated patch_cpuinfo_h_for_arm64ec.patch — retained, not yet upstream
Regenerated fix_missing_sysfs_fallback.patch — updated context lines for new cpuinfo code

Motivation and Context

crvineeth97 · 2026-04-28T00:06:43Z

@crvineeth97 please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
@microsoft-github-policy-service agree [company="{your company}"]
Options:

(default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
(when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"
Contributor License Agreement

@microsoft-github-policy-service agree company="Microsoft"

…tx_gen build - Downgrade QNN ETW profiling mismatch logs from ERROR to VERBOSE to prevent excessive telemetry events (~1 billion/week across Windows devices) - Add bounds checking in GQA attention to prevent size_t underflow when seqlens_k contains invalid data (fixes github.com/microsoft/issues/27170) - Build ep_weight_sharing_ctx_gen for TensorRT, OpenVINO, and VitisAI in addition to QNN Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Bump pytorch/cpuinfo to crvineeth97/cpuinfo@df8c6a8 which implements cpuinfo_deinitialize() to properly free heap-allocated globals. This prevents memory leak reports from App Verifier, Valgrind, and sanitizers when ORT is dynamically loaded/unloaded. ORT integration: - Add CPUIDInfo::ShutDown() which calls cpuinfo_deinitialize() - Call ShutdownCpuInfo() from DllMain on DLL_PROCESS_DETACH - In memleak-check builds, also call shutdown during process termination The cpuinfo bump also includes upstream fixes that make three ORT patches redundant (removed): - patch_vcpkg_arm64ec_support.patch (pytorch/cpuinfo#324) - win_arm_fp16_detection_fallback.patch (pytorch/cpuinfo#348) - 0001-Add-implementation-for-cpuinfo_deinitialize.patch The patch_cpuinfo_h_for_arm64ec.patch is retained as it is not yet upstream in cpuinfo. Related: pytorch/cpuinfo#150, pytorch/cpuinfo#387 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

edgchen1 · 2026-06-15T22:23:38Z

+        onnxruntime::CPUIDInfo::ShutdownCpuInfo();
 #endif
      } else {
        // Cleanup protobuf library.


also update this comment

edgchen1 · 2026-06-15T22:25:14Z

  // ETW disabled previously, but enabled now
  if (ProfilingLevel::INVALID == profiling_level_etw_ && tracelogging_provider_ep_enabled) {
-    LOGS(*logger_, ERROR) << "ETW disabled previously, but enabled now. Can't do the switch! Won't output any profiling.";
+    LOGS(*logger_, VERBOSE) << "ETW disabled previously, but enabled now. Can't do the switch! Won't output any profiling.";


ERROR -> VERBOSE seems like a big jump. can you elaborate on the reason for this change?

edgchen1 · 2026-06-15T22:26:39Z

+
+void CPUIDInfo::ShutDown() {
+#if defined(CPUINFO_SUPPORTED)
+  static bool is_shutdown = false;


it seems asymmetric to have a static local variable tracking shutdown, but not initialization. is pytorch_cpuinfo_init_ alone insufficient?

edgchen1 · 2026-06-15T22:27:18Z

-            default_pos_ids[b * sequence_length + s] = static_cast<int64_t>(1);
+
+        // Handle inconsistent random data in seqlens_k, when past_seqlen becomes negative
+        if (past_seqlen < 0) {


do we have unit test coverage for this change?

edgchen1 · 2026-06-15T22:28:53Z


-
-  if(onnxruntime_USE_QNN)
+  # Build ep_weight_sharing_ctx_gen for all supported EPs (QNN, TensorRT, OpenVINO, VitisAI)


this looks ok, but I think we will be moving these to plugin EPs eventually.

edgchen1 · 2026-06-15T22:41:05Z

    return has_fp16_;
  }

+  static void ShutdownCpuInfo() {


would be good to document this function. e.g., that it should be used instead of GetCPUIDInfo().ShutDown(), that we should not call GetCPUIDInfo() after this (or make that an error), etc.

also, casing of "shutdown" is inconsistent between ShutdownCpuInfo and ShutDown

devang-ml requested a review from edgchen1 April 27, 2026 20:31

edgchen1 reviewed Apr 27, 2026

View reviewed changes

Comment thread cmake/patches/cpuinfo/0001-Add-implementation-for-cpuinfo_deinitialize.patch Outdated

crvineeth97 force-pushed the vchelur/cpuinfo-memory-leak-patch branch from b478e1c to c53f5d5 Compare May 5, 2026 16:48

crvineeth97 mentioned this pull request Jun 11, 2026

Implement cpuinfo_deinitialize() to free heap-allocated globals pytorch/cpuinfo#387

Merged

crvineeth97 force-pushed the vchelur/cpuinfo-memory-leak-patch branch 6 times, most recently from 1f827a7 to 091118f Compare June 12, 2026 19:46

crvineeth97 and others added 3 commits June 12, 2026 14:50

Update to official cpuinfo merged fix

6300958

crvineeth97 force-pushed the vchelur/cpuinfo-memory-leak-patch branch from 091118f to 6300958 Compare June 12, 2026 21:51

crvineeth97 changed the title ~~Add support for dealloc of heap memory from CPUInfo library to prevent memory leaks during dynamic unload of ORT~~ Update cpuinfo to include cpuinfo_deinitialize(), fix QNN ETW logging, GQA underflow, and ep_weight_sharing_ctx_gen build Jun 13, 2026

crvineeth97 enabled auto-merge (squash) June 13, 2026 01:28

crvineeth97 requested a review from edgchen1 June 15, 2026 21:16

edgchen1 reviewed Jun 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update cpuinfo to include cpuinfo_deinitialize(), fix QNN ETW logging, GQA underflow, and ep_weight_sharing_ctx_gen build#28245

Update cpuinfo to include cpuinfo_deinitialize(), fix QNN ETW logging, GQA underflow, and ep_weight_sharing_ctx_gen build#28245
crvineeth97 wants to merge 3 commits into
microsoft:mainfrom
crvineeth97:vchelur/cpuinfo-memory-leak-patch

crvineeth97 commented Apr 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

crvineeth97 commented Apr 28, 2026

Uh oh!

edgchen1 Jun 15, 2026

Uh oh!

edgchen1 Jun 15, 2026

Uh oh!

edgchen1 Jun 15, 2026

Uh oh!

edgchen1 Jun 15, 2026

Uh oh!

edgchen1 Jun 15, 2026

Uh oh!

edgchen1 Jun 15, 2026

Uh oh!

edgchen1 Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants



		if(onnxruntime_USE_QNN)
		# Build ep_weight_sharing_ctx_gen for all supported EPs (QNN, TensorRT, OpenVINO, VitisAI)

Conversation

crvineeth97 commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

Uh oh!

crvineeth97 commented Apr 28, 2026

Uh oh!

edgchen1 Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

edgchen1 Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

edgchen1 Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

edgchen1 Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

edgchen1 Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

edgchen1 Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

edgchen1 Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

crvineeth97 commented Apr 27, 2026 •

edited

Loading