Update cpuinfo to include cpuinfo_deinitialize(), fix QNN ETW logging, GQA underflow, and ep_weight_sharing_ctx_gen build#28245
Conversation
@microsoft-github-policy-service agree company="Microsoft" |
b478e1c to
c53f5d5
Compare
1f827a7 to
091118f
Compare
…tx_gen build - Downgrade QNN ETW profiling mismatch logs from ERROR to VERBOSE to prevent excessive telemetry events (~1 billion/week across Windows devices) - Add bounds checking in GQA attention to prevent size_t underflow when seqlens_k contains invalid data (fixes github.com/microsoft/issues/27170) - Build ep_weight_sharing_ctx_gen for TensorRT, OpenVINO, and VitisAI in addition to QNN Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Bump pytorch/cpuinfo to crvineeth97/cpuinfo@df8c6a8 which implements cpuinfo_deinitialize() to properly free heap-allocated globals. This prevents memory leak reports from App Verifier, Valgrind, and sanitizers when ORT is dynamically loaded/unloaded. ORT integration: - Add CPUIDInfo::ShutDown() which calls cpuinfo_deinitialize() - Call ShutdownCpuInfo() from DllMain on DLL_PROCESS_DETACH - In memleak-check builds, also call shutdown during process termination The cpuinfo bump also includes upstream fixes that make three ORT patches redundant (removed): - patch_vcpkg_arm64ec_support.patch (pytorch/cpuinfo#324) - win_arm_fp16_detection_fallback.patch (pytorch/cpuinfo#348) - 0001-Add-implementation-for-cpuinfo_deinitialize.patch The patch_cpuinfo_h_for_arm64ec.patch is retained as it is not yet upstream in cpuinfo. Related: pytorch/cpuinfo#150, pytorch/cpuinfo#387 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
091118f to
6300958
Compare
| onnxruntime::CPUIDInfo::ShutdownCpuInfo(); | ||
| #endif | ||
| } else { | ||
| // Cleanup protobuf library. |
| // ETW disabled previously, but enabled now | ||
| if (ProfilingLevel::INVALID == profiling_level_etw_ && tracelogging_provider_ep_enabled) { | ||
| LOGS(*logger_, ERROR) << "ETW disabled previously, but enabled now. Can't do the switch! Won't output any profiling."; | ||
| LOGS(*logger_, VERBOSE) << "ETW disabled previously, but enabled now. Can't do the switch! Won't output any profiling."; |
There was a problem hiding this comment.
ERROR -> VERBOSE seems like a big jump. can you elaborate on the reason for this change?
|
|
||
| void CPUIDInfo::ShutDown() { | ||
| #if defined(CPUINFO_SUPPORTED) | ||
| static bool is_shutdown = false; |
There was a problem hiding this comment.
it seems asymmetric to have a static local variable tracking shutdown, but not initialization. is pytorch_cpuinfo_init_ alone insufficient?
| default_pos_ids[b * sequence_length + s] = static_cast<int64_t>(1); | ||
|
|
||
| // Handle inconsistent random data in seqlens_k, when past_seqlen becomes negative | ||
| if (past_seqlen < 0) { |
There was a problem hiding this comment.
do we have unit test coverage for this change?
|
|
||
|
|
||
| if(onnxruntime_USE_QNN) | ||
| # Build ep_weight_sharing_ctx_gen for all supported EPs (QNN, TensorRT, OpenVINO, VitisAI) |
There was a problem hiding this comment.
this looks ok, but I think we will be moving these to plugin EPs eventually.
| return has_fp16_; | ||
| } | ||
|
|
||
| static void ShutdownCpuInfo() { |
There was a problem hiding this comment.
would be good to document this function. e.g., that it should be used instead of GetCPUIDInfo().ShutDown(), that we should not call GetCPUIDInfo() after this (or make that an error), etc.
There was a problem hiding this comment.
also, casing of "shutdown" is inconsistent between ShutdownCpuInfo and ShutDown
Description
This PR contains three commits:
Commit 1: Miscellaneous fixes
size_tunderflow whenseqlens_kcontains invalid data (fixes Access Violation in onnxruntime_perf_test.exe due to inconsistent seqlens_k tensor random values #27170)ep_weight_sharing_ctx_genfor TensorRT, OpenVINO, and VitisAI in addition to QNNCommit 2: Bump cpuinfo and add
cpuinfo_deinitialize()integrationApplications that dynamically load and unload the onnxruntime DLL leave orphaned heap allocations from cpuinfo when the library is unloaded mid-process. These are flagged as memory leaks by App Verifier, Valgrind, AddressSanitizer, and LeakSanitizer.
This commit bumps
pytorch/cpuinfoto a version that implementscpuinfo_deinitialize()(pytorch/cpuinfo#387) and adds ORT integration:CPUIDInfo::ShutDown()callscpuinfo_deinitialize()to free heap-allocated globalsDllMaincallsShutdownCpuInfo()onDLL_PROCESS_DETACHInstanceCreatedatomic guard prevents singleton creation during DLL unloadCommit 3: Update to official cpuinfo merged fix
After pytorch/cpuinfo#387 merged upstream, updated the dependency to point to
pytorch/cpuinfomain (4628dc06).Patch changes:
win_arm_fp16_detection_fallback.patch— upstreamed via pytorch/cpuinfo#348patch_vcpkg_arm64ec_support.patch— regenerated for new cpuinfo; still needed (pytorch/cpuinfo#324 not yet merged)patch_cpuinfo_h_for_arm64ec.patch— retained, not yet upstreamfix_missing_sysfs_fallback.patch— updated context lines for new cpuinfo codeMotivation and Context