Fix corruption due to lock sharding issues by centralizing locking by ngoldbaum · Pull Request #5838 · OpenMathLib/OpenBLAS

ngoldbaum · 2026-06-14T21:46:53Z

I used both Claude and Codex to work on this. I typed this PR description by hand.

Summary

Fixes the specific lock sharding issue described in #5836 as well as other related issues caused by the current locking strategy allowing shared overlapping calls of different kernels. The locks are function-local and so appear as different mutex objects in each separate compilation unit.

I fixed that by moving the locking to its own compilation unit with a new internal helper functions for serializing calls. This also centralizes the platform-specific locking logic into the new file.

This has the net effect of reducing throughput for workloads that make overlapping external calls into different threaded kernels, because those calls are now serialized consistently instead of using per-kernel lock state. I think the existing behavior is a bug rather than an intentional design choice but I wanted to raise that front and center.

It also fixes several other issues, more or less as a consequence of making the above change systematically:

There is a behavior change in the OpenMP gemm3m implementation. Currently, locking is skipped for OpenMP gemm3m. I think this is a bug but maybe it's intentional? It would be a behavior change regardless.
The old win32 path appears to have used a separate critical section per invocation, so it did not provide cross-call serialization - there is one critical section per call in the old implementation. The new file initializes a process-wide lock once and uses that.
The old implementation caused substantial OpenMP oversubscription by duplicating parallel_section_left. Now there's only one version of this variable so there are far fewer OpenMP threads when one mixes kernels.

Testing

To verify the correctness of the fix, I added a new multithreaded stress test based on the reproducer in #5836. I also enabled multithreaded stress testing for msys2 on a Windows host to test the Windows threading model.

Additionally there is a thread sanitizer test run. I manually verified that the new tests trigger validation errors and/or TSan race reports. I also verified the new mixed DGEMM stress test (no TSan) fails with incorrect results on an unpatched develop build and passes with this change.

Right now there's only TSan testing for OPENBLAS_NUM_THREADS=2 and the pthreads backend. TSan detected data races with OPENBLAS_NUM_THREADS=4 and also with the OpenMP backend. I am intentionally leaving the TSan CI as-is in this PR. We'll need to look at other issues before setting up more thorough TSan CI.

martin-frbg · 2026-06-15T09:20:34Z

Thank you - beat me to it. (The single CI failure in Jenkins is an internal docker error related to the use of sudo for preparing a cmake-based build on zarch - I've restarted that job now)

ngoldbaum added 2 commits June 14, 2026 15:32

Fix corruption due to lock sharding issues by centralizing locking

9363452

fix windows build slowness and test errors

7c7c65e

martin-frbg added this to the 0.3.34 milestone Jun 15, 2026

martin-frbg merged commit 9bdf051 into OpenMathLib:develop Jun 15, 2026
103 of 104 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix corruption due to lock sharding issues by centralizing locking#5838

Fix corruption due to lock sharding issues by centralizing locking#5838
martin-frbg merged 2 commits into
OpenMathLib:developfrom
ngoldbaum:fix-level3-thread-locks-2

ngoldbaum commented Jun 14, 2026 •

edited

Loading

Uh oh!

martin-frbg commented Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ngoldbaum commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

martin-frbg commented Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ngoldbaum commented Jun 14, 2026 •

edited

Loading