Skip to content

Update cpuinfo to include cpuinfo_deinitialize(), fix QNN ETW logging, GQA underflow, and ep_weight_sharing_ctx_gen build#28245

Open
crvineeth97 wants to merge 3 commits into
microsoft:mainfrom
crvineeth97:vchelur/cpuinfo-memory-leak-patch
Open

Update cpuinfo to include cpuinfo_deinitialize(), fix QNN ETW logging, GQA underflow, and ep_weight_sharing_ctx_gen build#28245
crvineeth97 wants to merge 3 commits into
microsoft:mainfrom
crvineeth97:vchelur/cpuinfo-memory-leak-patch

Conversation

@crvineeth97

@crvineeth97 crvineeth97 commented Apr 27, 2026

Copy link
Copy Markdown

Description

This PR contains three commits:

Commit 1: Miscellaneous fixes

Commit 2: Bump cpuinfo and add cpuinfo_deinitialize() integration

Applications that dynamically load and unload the onnxruntime DLL leave orphaned heap allocations from cpuinfo when the library is unloaded mid-process. These are flagged as memory leaks by App Verifier, Valgrind, AddressSanitizer, and LeakSanitizer.

This commit bumps pytorch/cpuinfo to a version that implements cpuinfo_deinitialize() (pytorch/cpuinfo#387) and adds ORT integration:

  • CPUIDInfo::ShutDown() calls cpuinfo_deinitialize() to free heap-allocated globals
  • DllMain calls ShutdownCpuInfo() on DLL_PROCESS_DETACH
  • In memleak-check builds, shutdown also runs during process termination
  • InstanceCreated atomic guard prevents singleton creation during DLL unload

Commit 3: Update to official cpuinfo merged fix

After pytorch/cpuinfo#387 merged upstream, updated the dependency to point to pytorch/cpuinfo main (4628dc06).

Patch changes:

  • Removed win_arm_fp16_detection_fallback.patch — upstreamed via pytorch/cpuinfo#348
  • Updated patch_vcpkg_arm64ec_support.patch — regenerated for new cpuinfo; still needed (pytorch/cpuinfo#324 not yet merged)
  • Updated patch_cpuinfo_h_for_arm64ec.patch — retained, not yet upstream
  • Regenerated fix_missing_sysfs_fallback.patch — updated context lines for new cpuinfo code

Motivation and Context

@devang-ml devang-ml requested a review from edgchen1 April 27, 2026 20:31
Comment thread cmake/patches/cpuinfo/0001-Add-implementation-for-cpuinfo_deinitialize.patch Outdated
@crvineeth97

Copy link
Copy Markdown
Author

@crvineeth97 please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@microsoft-github-policy-service agree company="Microsoft"

@crvineeth97 crvineeth97 force-pushed the vchelur/cpuinfo-memory-leak-patch branch from b478e1c to c53f5d5 Compare May 5, 2026 16:48
@crvineeth97 crvineeth97 force-pushed the vchelur/cpuinfo-memory-leak-patch branch 6 times, most recently from 1f827a7 to 091118f Compare June 12, 2026 19:46
crvineeth97 and others added 3 commits June 12, 2026 14:50
…tx_gen build

- Downgrade QNN ETW profiling mismatch logs from ERROR to VERBOSE to prevent
  excessive telemetry events (~1 billion/week across Windows devices)
- Add bounds checking in GQA attention to prevent size_t underflow when
  seqlens_k contains invalid data (fixes github.com/microsoft/issues/27170)
- Build ep_weight_sharing_ctx_gen for TensorRT, OpenVINO, and VitisAI in
  addition to QNN

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Bump pytorch/cpuinfo to crvineeth97/cpuinfo@df8c6a8 which implements
cpuinfo_deinitialize() to properly free heap-allocated globals. This
prevents memory leak reports from App Verifier, Valgrind, and sanitizers
when ORT is dynamically loaded/unloaded.

ORT integration:
- Add CPUIDInfo::ShutDown() which calls cpuinfo_deinitialize()
- Call ShutdownCpuInfo() from DllMain on DLL_PROCESS_DETACH
- In memleak-check builds, also call shutdown during process termination

The cpuinfo bump also includes upstream fixes that make three ORT patches
redundant (removed):
- patch_vcpkg_arm64ec_support.patch (pytorch/cpuinfo#324)
- win_arm_fp16_detection_fallback.patch (pytorch/cpuinfo#348)
- 0001-Add-implementation-for-cpuinfo_deinitialize.patch

The patch_cpuinfo_h_for_arm64ec.patch is retained as it is not yet
upstream in cpuinfo.

Related: pytorch/cpuinfo#150, pytorch/cpuinfo#387

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@crvineeth97 crvineeth97 force-pushed the vchelur/cpuinfo-memory-leak-patch branch from 091118f to 6300958 Compare June 12, 2026 21:51
@crvineeth97 crvineeth97 changed the title Add support for dealloc of heap memory from CPUInfo library to prevent memory leaks during dynamic unload of ORT Update cpuinfo to include cpuinfo_deinitialize(), fix QNN ETW logging, GQA underflow, and ep_weight_sharing_ctx_gen build Jun 13, 2026
@crvineeth97 crvineeth97 enabled auto-merge (squash) June 13, 2026 01:28
@crvineeth97 crvineeth97 requested a review from edgchen1 June 15, 2026 21:16
onnxruntime::CPUIDInfo::ShutdownCpuInfo();
#endif
} else {
// Cleanup protobuf library.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also update this comment

// ETW disabled previously, but enabled now
if (ProfilingLevel::INVALID == profiling_level_etw_ && tracelogging_provider_ep_enabled) {
LOGS(*logger_, ERROR) << "ETW disabled previously, but enabled now. Can't do the switch! Won't output any profiling.";
LOGS(*logger_, VERBOSE) << "ETW disabled previously, but enabled now. Can't do the switch! Won't output any profiling.";

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ERROR -> VERBOSE seems like a big jump. can you elaborate on the reason for this change?


void CPUIDInfo::ShutDown() {
#if defined(CPUINFO_SUPPORTED)
static bool is_shutdown = false;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems asymmetric to have a static local variable tracking shutdown, but not initialization. is pytorch_cpuinfo_init_ alone insufficient?

default_pos_ids[b * sequence_length + s] = static_cast<int64_t>(1);

// Handle inconsistent random data in seqlens_k, when past_seqlen becomes negative
if (past_seqlen < 0) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have unit test coverage for this change?



if(onnxruntime_USE_QNN)
# Build ep_weight_sharing_ctx_gen for all supported EPs (QNN, TensorRT, OpenVINO, VitisAI)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks ok, but I think we will be moving these to plugin EPs eventually.

return has_fp16_;
}

static void ShutdownCpuInfo() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to document this function. e.g., that it should be used instead of GetCPUIDInfo().ShutDown(), that we should not call GetCPUIDInfo() after this (or make that an error), etc.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, casing of "shutdown" is inconsistent between ShutdownCpuInfo and ShutDown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Access Violation in onnxruntime_perf_test.exe due to inconsistent seqlens_k tensor random values

2 participants