Skip to content

Profiling: Skip class/module memsize on Ruby 4 to avoid heap profiling SIGSEGV#5938

Open
navidemad wants to merge 2 commits into
DataDog:masterfrom
navidemad:navidemad/profiling-skip-class-memsize-ruby4
Open

Profiling: Skip class/module memsize on Ruby 4 to avoid heap profiling SIGSEGV#5938
navidemad wants to merge 2 commits into
DataDog:masterfrom
navidemad:navidemad/profiling-skip-class-memsize-ruby4

Conversation

@navidemad

Copy link
Copy Markdown
Contributor

What does this PR do?

Skips computing rb_obj_memsize_of for T_CLASS / T_MODULE / T_ICLASS objects on Ruby 4.0+, where it can crash the VM with a SIGSEGV during heap profiling.

ruby_obj_memsize_of already keeps a denylist of rb_obj_memsize_of paths that crash the VM (e.g. T_NODE). This adds the class-like types to that denylist on Ruby 4.0+, gated by a new NO_SAFE_CLASS_MEMSIZE compile-time define. On older Rubies the define is absent, so the preprocessed code is byte-for-byte unchanged.

Motivation:

With experimental heap profiling enabled on Ruby 4.0, we saw recurring SIGSEGV crashes on production Sidekiq workers (#5936). During heap profile serialization, the recorder resurrects each tracked object via ObjectSpace._id2ref and calls rb_obj_memsize_of on it. For class objects, rb_obj_memsize_of walks Ruby 4.0's per-namespace class extensions (rb_class_classext_foreachclassext_memsizerb_id_table_memsize) and dereferences invalid classext memory, killing the process. Class objects contribute negligibly to a heap size profile, so skipping them removes the crash with no meaningful loss of signal. The full root-cause analysis is in #5936.

Change log entry

Yes. Profiling: Fix a SIGSEGV crash that could happen with experimental heap profiling enabled on Ruby 4.0.

Additional Notes:

The gate is a new NO_SAFE_CLASS_MEMSIZE define, set for RUBY_VERSION >= "4", following the existing extconf.rb convention for Ruby-4-specific behavior. A complete fix would also need Ruby-side hardening of classext_memsize / rb_class_classext_foreach, but skipping class memsize is enough to stop the crash.

How to test the change?

Added a regression test in stack_recorder_spec.rb: it tracks a Class as a live heap object and checks that the profiler reports it without crashing, with a heap-live-size of 0 on Ruby 4.0+ (and the real ObjectSpace.memsize_of on older Rubies). Verified on Linux + Ruby 4.0 (tracer-4.0): the full stack_recorder_spec passes (64 examples, 0 failures). On Ruby < 4 the change is a no-op at the preprocessor level.

…g SIGSEGV

On Ruby 4.0, computing the memsize of a class/module/iclass walks the
per-namespace class extensions (rb_class_classext_foreach ->
classext_memsize -> rb_id_table_memsize) and can crash the VM with a
SIGSEGV when called on objects resurrected via ObjectSpace._id2ref
during heap profiling.

Add T_CLASS/T_MODULE/T_ICLASS to the existing crash-path denylist in
ruby_obj_memsize_of, gated behind a new NO_SAFE_CLASS_MEMSIZE define for
Ruby >= 4. On older Rubies the preprocessed code is unchanged. Class
objects contribute negligibly to a heap size profile.

See DataDog#5936
@vpellan vpellan added the community Was opened by a community member label Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community Was opened by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants