Tracer Version(s)
2.35.0
Ruby Version(s)
ruby 4.0.5 [x86_64-linux]
Relevant Library and Version(s)
libdatadog 33.0.0.1.0
Bug Report
Within hours of enabling experimental heap profiling (experimental_heap_enabled = true) on Ruby 4.0.5, our Sidekiq workers started segfaulting with two crash signatures that always appeared together. Roughly 15 crashes of each signature over ~10 hours, across several worker types. Both stopped completely the moment we disabled experimental_heap_enabled.
Signature A — crash inside the tracer:
- SIGSEGV (SEGV_MAPERR)
- frame:
st_object_record_update in ext/datadog_profiling_native_extension/heap_recorder.c
Signature B — crash inside Ruby's GC:
- SIGSEGV (SI_KERNEL)
- frame:
classext_memsize in Ruby's gc.c
These look like two faces of the same call chain. During a full heap update, heap_recorder_update calls st_foreach(object_records, st_object_record_update, ...). For each live object, when size profiling is on, st_object_record_update calls ruby_obj_memsize_of(ref) → rb_obj_memsize_of(obj). For T_CLASS / T_MODULE / T_ICLASS objects on Ruby 4.0, rb_obj_memsize_of walks the per-namespace class extensions and calls into classext_memsize (new in Ruby 4.0's namespace/classext implementation) — which is where the process dies. Depending on where the bad access lands, the crash is attributed either to the tracer frame (Signature A) or to the Ruby GC frame (Signature B).
ruby_obj_memsize_of (in ruby_helpers.c) already keeps a denylist of rb_obj_memsize_of paths that rb_bug/crash the VM (e.g. T_NODE is explicitly excluded), but it still forwards T_CLASS / T_MODULE / T_ICLASS straight through to rb_obj_memsize_of. On Ruby 4.0 that path is no longer safe.
This only triggers with heap size profiling: st_object_record_update calls ruby_obj_memsize_of only when size_enabled && update_include_old (full update). experimental_heap_size_enabled defaults to true whenever heap profiling is enabled, so anyone turning on heap profiling on Ruby 4.0 is exposed by default.
This appears to be the same Ruby 4.0 incompatibility that #5148 originally guarded against and that #5201 re-enabled behind the experimental flag — the class-object size-accounting path still seems unsafe on Ruby 4.0.
Suggested fix direction: skip T_CLASS / T_MODULE / T_ICLASS in ruby_obj_memsize_of on Ruby 4.0 (return 0, as is already done for unsupported types), or otherwise guard the classext memsize path — analogous to the existing T_NODE exclusion. Alternatively, gate experimental_heap_size_enabled off on Ruby 4.0 until the classext path is safe.
Reproduction Code
No minimal reproduction yet. It triggers under real workload on Sidekiq workers that allocate and retain many Class/Module objects, and reproduces reliably within hours of enabling the flag on Ruby 4.0.5. It disappears entirely once experimental_heap_enabled is set back to false.
Configuration Block
Datadog.configure do |c|
c.profiling.enabled = true
c.profiling.advanced.allocation_enabled = true
c.profiling.advanced.experimental_heap_enabled = true
# experimental_heap_size_enabled left at its default (true)
end
Error Logs
We have the full native crash dumps for both signatures and will attach them to this issue. The crashing frames are st_object_record_update (heap_recorder.c) for Signature A and classext_memsize (gc.c) for Signature B.
Operating System
Linux (x86_64, container)
Tracer Version(s)
2.35.0
Ruby Version(s)
ruby 4.0.5 [x86_64-linux]
Relevant Library and Version(s)
libdatadog 33.0.0.1.0
Bug Report
Within hours of enabling experimental heap profiling (
experimental_heap_enabled = true) on Ruby 4.0.5, our Sidekiq workers started segfaulting with two crash signatures that always appeared together. Roughly 15 crashes of each signature over ~10 hours, across several worker types. Both stopped completely the moment we disabledexperimental_heap_enabled.Signature A — crash inside the tracer:
st_object_record_updateinext/datadog_profiling_native_extension/heap_recorder.cSignature B — crash inside Ruby's GC:
classext_memsizein Ruby'sgc.cThese look like two faces of the same call chain. During a full heap update,
heap_recorder_updatecallsst_foreach(object_records, st_object_record_update, ...). For each live object, when size profiling is on,st_object_record_updatecallsruby_obj_memsize_of(ref)→rb_obj_memsize_of(obj). ForT_CLASS/T_MODULE/T_ICLASSobjects on Ruby 4.0,rb_obj_memsize_ofwalks the per-namespace class extensions and calls intoclassext_memsize(new in Ruby 4.0's namespace/classext implementation) — which is where the process dies. Depending on where the bad access lands, the crash is attributed either to the tracer frame (Signature A) or to the Ruby GC frame (Signature B).ruby_obj_memsize_of(inruby_helpers.c) already keeps a denylist ofrb_obj_memsize_ofpaths thatrb_bug/crash the VM (e.g.T_NODEis explicitly excluded), but it still forwardsT_CLASS/T_MODULE/T_ICLASSstraight through torb_obj_memsize_of. On Ruby 4.0 that path is no longer safe.This only triggers with heap size profiling:
st_object_record_updatecallsruby_obj_memsize_ofonly whensize_enabled && update_include_old(full update).experimental_heap_size_enableddefaults totruewhenever heap profiling is enabled, so anyone turning on heap profiling on Ruby 4.0 is exposed by default.This appears to be the same Ruby 4.0 incompatibility that #5148 originally guarded against and that #5201 re-enabled behind the experimental flag — the class-object size-accounting path still seems unsafe on Ruby 4.0.
Suggested fix direction: skip
T_CLASS/T_MODULE/T_ICLASSinruby_obj_memsize_ofon Ruby 4.0 (return 0, as is already done for unsupported types), or otherwise guard the classext memsize path — analogous to the existingT_NODEexclusion. Alternatively, gateexperimental_heap_size_enabledoff on Ruby 4.0 until the classext path is safe.Reproduction Code
No minimal reproduction yet. It triggers under real workload on Sidekiq workers that allocate and retain many Class/Module objects, and reproduces reliably within hours of enabling the flag on Ruby 4.0.5. It disappears entirely once
experimental_heap_enabledis set back to false.Configuration Block
Error Logs
We have the full native crash dumps for both signatures and will attach them to this issue. The crashing frames are
st_object_record_update(heap_recorder.c) for Signature A andclassext_memsize(gc.c) for Signature B.Operating System
Linux (x86_64, container)