Skip to content

Possible Windows thread-state issue with gRPC worker threads (CreateThread) in mimalloc 3.3.2 #1324

Description

@AnneJanB

Hi,

We are seeing an intermittent read access violation in mimalloc on Windows that seems related to work running on gRPC worker threads.

Environment
OS: Windows Server
mimalloc: 3.3.2
gRPC: 1.71.0
Application: heavily multi-threaded C++ server
Symptom
We occasionally hit a read access violation (0xc0000005) in:

alloc.c around line 259
From the behavior, it seems possible that the association between a mimalloc heap and thread-local mimalloc state becomes invalid or inconsistent in our scenario.

Context: Our server application uses many threads and many heaps.

We do recycle the mimalloc heaps, but we see the same behavior when heap recycling is disabled, so heap reuse alone does not seem to explain the crash.

The issue appears when request handling runs directly on a gRPC thread. In our setup, those threads appear to be created through the Windows API CreateThread.

What changes the behavior

  1. Dispatching the work to our own thread pool avoids the crash
    If the gRPC thread only receives the request and we dispatch the actual work to our own thread pool, the problem disappears.
    Our pool uses std::thread, and those threads are marked with:

mi_thread_set_in_threadpool();
In this configuration, we do not see the access violations.

  1. One-time TLS-based mimalloc initialization on the gRPC thread does not fix it
    We tried explicitly initializing mimalloc thread state on the gRPC thread using a thread_local helper:
struct GrpcTLS
{
  static GrpcTLS& Get();
private:
  GrpcTLS()
  {
    mi_thread_init();
    mi_thread_set_in_threadpool();
  }
  ~GrpcTLS()
  {
    mi_thread_done();
  }
  GrpcTLS(const GrpcTLS&) = delete;
  GrpcTLS& operator=(const GrpcTLS&) = delete;
};

GrpcTLS& GrpcTLS::Get()
{
  static thread_local GrpcTLS s_grpc_tls;
  return s_grpc_tls;
}

Before the gRPC thread starts processing work, we call:
GrpcTLS::Get();
We verified through tracing that the crashing thread did execute this thread_local constructor.

Despite that, the crash still occurs.

  1. Calling mi_thread_done() and mi_thread_init() before each work item appears to avoid the crash
    As an experiment, we tried calling:
    mi_thread_done(); mi_thread_init();
    immediately before each work item executed on the gRPC thread.

This is not something we consider a proper solution, but with this change the crashes appear to stop.

Questions
Does this pattern match any known issue or unsupported usage pattern on Windows?

In particular:

Are there known caveats when using mimalloc on threads created via CreateThread rather than _beginthreadex / std::thread?
Is there any known limitation around mi_thread_init() / mi_thread_done() on externally managed or reused worker threads?
Is mi_thread_set_in_threadpool() expected to be sufficient for long-lived reused worker threads such as those used internally by gRPC?
Does the fact that repeated mi_thread_done() / mi_thread_init() appears to help suggest that mimalloc thread-local state may be getting lost or becoming invalid?
We realize this is difficult to diagnose without a minimal repro, and unfortunately we do not yet have one.

If useful, we can provide crash summary with callstack, or any other info you might need.

Thanks,
Anne Jan Beeks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions