Skip to content

[BUG]: SIGSEGV in heap_recorder.c (st_object_record_update → ruby_obj_memsize_of → classext_memsize) on Ruby 4.0 with heap size profiling #5936

Description

@navidemad

Tracer Version(s)

2.35.0

Ruby Version(s)

ruby 4.0.5 [x86_64-linux]

Relevant Library and Version(s)

libdatadog 33.0.0.1.0

Bug Report

Within hours of enabling experimental heap profiling (experimental_heap_enabled = true) on Ruby 4.0.5, our Sidekiq workers started segfaulting with two crash signatures that always appeared together. Roughly 15 crashes of each signature over ~10 hours, across several worker types. Both stopped completely the moment we disabled experimental_heap_enabled.

Signature A — crash inside the tracer:

  • SIGSEGV (SEGV_MAPERR)
  • frame: st_object_record_update in ext/datadog_profiling_native_extension/heap_recorder.c

Signature B — crash inside Ruby's GC:

  • SIGSEGV (SI_KERNEL)
  • frame: classext_memsize in Ruby's gc.c

These look like two faces of the same call chain. During a full heap update, heap_recorder_update calls st_foreach(object_records, st_object_record_update, ...). For each live object, when size profiling is on, st_object_record_update calls ruby_obj_memsize_of(ref)rb_obj_memsize_of(obj). For T_CLASS / T_MODULE / T_ICLASS objects on Ruby 4.0, rb_obj_memsize_of walks the per-namespace class extensions and calls into classext_memsize (new in Ruby 4.0's namespace/classext implementation) — which is where the process dies. Depending on where the bad access lands, the crash is attributed either to the tracer frame (Signature A) or to the Ruby GC frame (Signature B).

ruby_obj_memsize_of (in ruby_helpers.c) already keeps a denylist of rb_obj_memsize_of paths that rb_bug/crash the VM (e.g. T_NODE is explicitly excluded), but it still forwards T_CLASS / T_MODULE / T_ICLASS straight through to rb_obj_memsize_of. On Ruby 4.0 that path is no longer safe.

This only triggers with heap size profiling: st_object_record_update calls ruby_obj_memsize_of only when size_enabled && update_include_old (full update). experimental_heap_size_enabled defaults to true whenever heap profiling is enabled, so anyone turning on heap profiling on Ruby 4.0 is exposed by default.

This appears to be the same Ruby 4.0 incompatibility that #5148 originally guarded against and that #5201 re-enabled behind the experimental flag — the class-object size-accounting path still seems unsafe on Ruby 4.0.

Suggested fix direction: skip T_CLASS / T_MODULE / T_ICLASS in ruby_obj_memsize_of on Ruby 4.0 (return 0, as is already done for unsupported types), or otherwise guard the classext memsize path — analogous to the existing T_NODE exclusion. Alternatively, gate experimental_heap_size_enabled off on Ruby 4.0 until the classext path is safe.

Reproduction Code

No minimal reproduction yet. It triggers under real workload on Sidekiq workers that allocate and retain many Class/Module objects, and reproduces reliably within hours of enabling the flag on Ruby 4.0.5. It disappears entirely once experimental_heap_enabled is set back to false.

Configuration Block

Datadog.configure do |c|
  c.profiling.enabled = true
  c.profiling.advanced.allocation_enabled = true
  c.profiling.advanced.experimental_heap_enabled = true
  # experimental_heap_size_enabled left at its default (true)
end

Error Logs

We have the full native crash dumps for both signatures and will attach them to this issue. The crashing frames are st_object_record_update (heap_recorder.c) for Signature A and classext_memsize (gc.c) for Signature B.

Operating System

Linux (x86_64, container)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions